XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (69 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

9.79Mb size Format: txt, pdf, ePub

Elements with Mixed Content

The type of an element that can contain child elements is called a
complex type with complex content
. Such types essentially fall into three categories, called
empty content
,
mixed content
, and
element-only content
. Mixed content allows intermingled text and child elements, and is often found in narrative XML documents, allowing markup such as:

The population of London reached

5,572,000 in 1891, and had risen

further to 7,160,000 by 1911.

The type of this element could be declared in a schema as:

In practice, the list of permitted child elements would probably be much longer than this, and a common technique is to define
substitution groups
, which allow a list of such elements to be referred to by a single name.

Narrative documents tend to be less constrained than documents holding structured data such as purchase orders and invoices, and while schema validation is still very useful, the type annotations generated as a result of validation aren't generally so important when the time comes to process the data using XSLT; the names of the elements are usually more significant than their types. However, there is plenty of potential for using the types, especially if the schema is designed with this in mind.

When schemas are used primarily for validation, the tendency is to think of types in terms of the form that values assume. For example, it is natural to define the element

(as used in the example above) as a type derived from
xs:token
by restriction, because the names of cities are strings, perhaps consisting of multiple words, in which spaces are not significant. Once types start to be used for processing information (which is what you are doing when you use XSLT), it's also useful to think about what the value actually means. The content of the

element is not just a string of characters, it is the name of a geographical place, a place that has a location on the Earth's surface, that is in a particular country, and that may figure in postal addresses. If you have other similar elements such as

,

, and

, it might be a good idea to define a single type for all of them. Even if this type doesn't have any particular purpose for validation, because it doesn't define any extra constraints on the content, it can potentially be useful when writing XSLT templates because it groups a number of elements that belong together semantically.

Elements with Element-Only Content

This category covers most of the “wrapper” elements that are found in data-oriented XML. A typical example is the outer

element in a structure such as:

Michael

Howard

Kay

1951-10-11

Hannover

The schema for this might be:

There are a number of ways these definitions could have been written. In the so-called
Russian Doll
style, the types would be defined inline within the element declarations, rather than being given separate names of their own. The schema could have been written using more top-level element declarations, for example the

element could have been described at a top level. When you use a schema for validation, these design decisions mainly affect your ability to reuse definitions later when the schema changes. When you use a schema to describe types that can be referenced in XSLT stylesheets, however, they also affect the ease of writing the stylesheet.

In choosing the representation of the schema shown above, I made a number of implicit assumptions:

It's quite likely that there will be other elements with the same structure as

, or with an extension of this structure: perhaps not at the moment, but at some time in the future. Examples of such elements might be

or

. Therefore, it's worth describing the element and its type separately.
Similarly, personal names are likely to appear in a number of different places. Elements with this type won't always be called

, so it's a good idea to create a type definition that can be referenced from any element.
Not every element called

will be a personal name, the same tag might also be used (even in the same namespace) for other purposes. If I were confident that the tag would always be used for personal names, then I would probably have made it the subject of a top-level element declaration, rather than defining it inline within the

element.
The elements at the leaves of the tree (those with simple types) such as

,

,

, and

are probably best defined using local element declarations rather than top-level declarations. Even if they are used in more than one container element, there is relatively little to be gained by pulling the element declarations out to the top level. The important thing is that if any of them have a user-defined type (which isn't the case in this example), then the user-defined types are defined using top-level

declarations. This is what I have done for the
id
attribute (which is defined as a subtype of
xs:ID
, forcing values to be unique within any XML document), but I chose not to do the same for the leaf elements.

XSLT 2.0 allows you to validate elements against either a global element declaration or a global type definition, so you'll be able to validate an element provided that either the element declaration or the type definition is global. If you're planning to use XQuery with your schema, however, it's worth bearing in mind that XQuery doesn't allow validation against a type definition, so validation is only possible if the element declaration is global. It's useful to be able to validate individual elements because you can then assign the elements to variables or function parameters that require an element of a particular type. The alternative is to defer validation until a complete document has been constructed.

Defining a Type Hierarchy

Using top-level type definitions is very handy when you have many different elements using the same type definitions. I've come across an example of this in action when handling files containing genealogical data. A lot of this data is concerned with recording events: events such as births, baptisms, marriages, deaths, and burials, but also many other miscellaneous events such as a mention in a newspaper, enrollment at a school or university, starting a new job, receiving a military honor, and so on. Traditionally, this data is recorded using a file format called GEDCOM, which predates XML by many years, but can very easily be translated directly into XML and manipulated using XSLT, as we will see in Chapter 19.

The GEDCOM specification defines about 30 kinds of event such as
BIRTH
,
DEATH
, and
MARRIAGE
, and then provides a general catch-all
EVENT
record for anything else you might want to keep information about. All these records have a common structure: they allow information about the date and place of the event, the sources of information about the event, the participants and witnesses, and so on. In other words, they are all different elements with the same type.

In XSLT 1.0, the only way of referring to elements was by name. This meant that if you wanted to write a template rule to process any kind of event, you had to know all the element names representing events, and write a union expression of the form
BIRTH|DEATH|MARRIAGE|...
to select them. This is tedious to say the least, and it is also inextensible: when new kinds of event are introduced, the stylesheet stops working.

XSLT 2.0 introduces the ability to refer to elements by type: you can now write a template that specifies
match = “element(*, EVENT)”
, which matches all elements of type
EVENT
. The
*
indicates that you don't care what the name of the element is, you are interested only in its type. This is both more convenient and more flexible than listing all the different kinds of event by name.

Other books

Red Knife by William Kent Krueger

Extrasensory by Desiree Holt

Prague Murder by Amanda A. Allen

Scenes from an Unholy War by Hideyuki Kikuchi

The Invention of Flight by Susan Neville

Never Surrender (The Empire's Corps Book 10) by Christopher Nuttall

Line Of Scrimmage by Lace, Lolah

the Savage Day - Simon Vaughn 02 (v5) by Jack Higgins

Redemption in Love (Hearts on the Line) by Lee, Nadia

Stained by Cheryl Rainfield