XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (718 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

7.13Mb size Format: txt, pdf, ePub

My first step was to load the DTD into Stylus Studio and convert it to a schema. You could equally well do this using other tools such as XML Spy or oXygen. In fact Stylus Studio offers a choice of two converters, one of which is native to Stylus, the other a packaging of James Clark's
trang
program. I found that the native tool, with all options defaulted, did a very satisfactory job: the output is in the download directory as
rawschema-stylus.xsd
.

I then refined this schema by hand. The changes fell into the following categories:

A number of the top-level elements have similar structure, in particular they all have children such as
ExternalID
,
Submitter
,
Note
,
Evidence
,
Enrichment
, and
Changed
. Because these common fields appear at the end, it's not possible to represent this in XML Schema by extending a common supertype, but the common data can be extracted into a named model group, which I called
CommonFields
. Similarly, complex types such as
BasicLinkType
and
ParentType
were created in cases where several elements have the same content model.
Adding an import for the schema for the XML namespace, since the schema uses the
xml:lang
attribute.
Adding simple type definitions for
GeneralDate
and
StandardDate
, as described below, and for other shared types such as
TimeType
.

An interesting feature of this data is that the schema is very permissive. For example, it specifies a default format for dates in the form
DD
MMM
YYYY
(such as
18
APR
1924
), which has long been the convention used by genealogists. However, it doesn't insist that the date of an event takes this form. It's quite OK, for example, to replace the last digit of the year by a question mark, perhaps to reflect the fact that the digit is difficult to decipher on an original manuscript. There are certain approved conventions such as preceding the date with
ABT
to indicate that the date is approximate, or
EST
to say that it is estimated, but there are no absolute rules. The golden rule in genealogy is that when you find information in a source document, you should be able to transcribe it as faithfully to the original as you possibly can, and a schema that imposes restrictions on your ability to do this is considered a bad thing. If you find an old church register in which a date of baptism is recorded as
Septuagesima 1582
, then you should be able to enter that in your database. I'll come back to the modeling of dates in the schema on page 1057.

In GEDCOM, there is no formal way of linking one file to another. XML, of course, creates wonderful opportunities to define how your family tree links to someone else's. But the linking isn't as easy as it sounds (nothing is, in genealogy) because of the problems of maintaining version integrity between two datasets that are changing independently. So I'll avoid getting into that area and stick to the model that has the whole family tree in one XML document.

The GEDCOM 6.0 Schema

Let's now take a quick look at some aspects of the XML Schema which I created for GEDCOM 6.0. In principle, because it's converted from the DTD, it covers all aspects of the specification; however, in improving the schema to describe the specification more precisely and more usefully, I concentrated on the parts that we are actually using in the application in this chapter: in particular, the three main object types individual, event, and family, and the three main properties, namely date, place, and personal name.

Individuals

Here is the element declaration for an

:

IndivName
gives the name of the individual.
Gender
has the obvious meaning;
DeathStatus
is for recording information such as “died in infancy” when no specific death event is known.
PersInfo
allows recording of arbitrary personal information such as occupation and religion.
AssocIndiv
is for links to related individuals where the relationships cannot be expressed directly through Family objects (for example, links to godparents).
DupIndiv
is interesting: it allows an assertion that this
IndividualRec
refers to the same individual as another
IndividualRec
. This is very useful when combining data sets compiled by different genealogists; merging the two records into one can be very difficult if there are inconsistencies in the data, and it can prove very difficult to unmerge the data later if they are found to be different individuals after all. Within the
CommonFields
group, which is also present in other top-level elements,
ExternalID
is for reference numbers that identify the individual in external databases;
Submitter
is the person who created this record;
Note
is for arbitrary comments;
Evidence
says where the information came from;
Enrichment
is for inline documentation such as photographs or transcripts of original documents, and
Changed
is for a change history of this record.

Other books

Against the Wall by Julie Prestsater

The Billionaire's Second Chance by Peyton Reeser

[Bayou Gavotte 00.0] Back to Bite You by Barbara Monajem

5 Murder by Syllabub by Kathleen Delaney

The Bachelor's Baby (Bachelor Auction Book 3) by Dani Collins

Eat Fat, Lose Fat by Mary Enig

Love's Long Shadow by Ciara Knight

Virgin by Radhika Sanghani

Ragnarok: The Fate of Gods by Jake La Jeunesse

Prototype by Brian Hodge