Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
Elements with Mixed Content
The type of an element that can contain child elements is called a
complex type with complex content
. Such types essentially fall into three categories, called
empty content
,
mixed content
, and
element-only content
. Mixed content allows intermingled text and child elements, and is often found in narrative XML documents, allowing markup such as:
further to
The type of this element could be declared in a schema as:
In practice, the list of permitted child elements would probably be much longer than this, and a common technique is to define
substitution groups
, which allow a list of such elements to be referred to by a single name.
Narrative documents tend to be less constrained than documents holding structured data such as purchase orders and invoices, and while schema validation is still very useful, the type annotations generated as a result of validation aren't generally so important when the time comes to process the data using XSLT; the names of the elements are usually more significant than their types. However, there is plenty of potential for using the types, especially if the schema is designed with this in mind.
When schemas are used primarily for validation, the tendency is to think of types in terms of the form that values assume. For example, it is natural to define the element
xs:token
by restriction, because the names of cities are strings, perhaps consisting of multiple words, in which spaces are not significant. Once types start to be used for processing information (which is what you are doing when you use XSLT), it's also useful to think about what the value actually means. The content of the
Elements with Element-Only Content
This category covers most of the “wrapper” elements that are found in data-oriented XML. A typical example is the outer
The schema for this might be:
There are a number of ways these definitions could have been written. In the so-called
Russian Doll
style, the types would be defined inline within the element declarations, rather than being given separate names of their own. The schema could have been written using more top-level element declarations, for example the
In choosing the representation of the schema shown above, I made a number of implicit assumptions:
XSLT 2.0 allows you to validate elements against either a global element declaration or a global type definition, so you'll be able to validate an element provided that either the element declaration or the type definition is global. If you're planning to use XQuery with your schema, however, it's worth bearing in mind that XQuery doesn't allow validation against a type definition, so validation is only possible if the element declaration is global. It's useful to be able to validate individual elements because you can then assign the elements to variables or function parameters that require an element of a particular type. The alternative is to defer validation until a complete document has been constructed.
Defining a Type Hierarchy
Using top-level type definitions is very handy when you have many different elements using the same type definitions. I've come across an example of this in action when handling files containing genealogical data. A lot of this data is concerned with recording events: events such as births, baptisms, marriages, deaths, and burials, but also many other miscellaneous events such as a mention in a newspaper, enrollment at a school or university, starting a new job, receiving a military honor, and so on. Traditionally, this data is recorded using a file format called GEDCOM, which predates XML by many years, but can very easily be translated directly into XML and manipulated using XSLT, as we will see in Chapter 19.
The GEDCOM specification defines about 30 kinds of event such as
BIRTH
,
DEATH
, and
MARRIAGE
, and then provides a general catch-all
EVENT
record for anything else you might want to keep information about. All these records have a common structure: they allow information about the date and place of the event, the sources of information about the event, the participants and witnesses, and so on. In other words, they are all different elements with the same type.
In XSLT 1.0, the only way of referring to elements was by name. This meant that if you wanted to write a template rule to process any kind of event, you had to know all the element names representing events, and write a union expression of the form
BIRTH|DEATH|MARRIAGE|...
to select them. This is tedious to say the least, and it is also inextensible: when new kinds of event are introduced, the stylesheet stops working.
XSLT 2.0 introduces the ability to refer to elements by type: you can now write a template that specifies
match = “element(*, EVENT)”
, which matches all elements of type
EVENT
. The
*
indicates that you don't care what the name of the element is, you are interested only in its type. This is both more convenient and more flexible than listing all the different kinds of event by name.