Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
Despite this clarification of the rules, I wouldn't normally recommend using the
attribute in a stylesheet, but if there are large chunks of existing XML that you want to copy into the stylesheet verbatim, the technique can be useful.
Solving Whitespace Problems
There are two typical problems with whitespace in the output: too much of it, or too little.
If you are generating HTML, a bit of extra whitespace usually doesn't matter, though there are some places where it can slightly distort the layout of your page. With some text formats, however (a classic example is comma-separated values) you need to be very careful to output whitespace in exactly the right places.
Too Much Whitespace
If you are getting too much whitespace, there are three possible places it can be coming from:
First ensure that you setindent = “no”
on the
If the output whitespace is adjacent to text, then it probably comes from the same place as that text.
If the offending whitespace is between tags in the output, then it probably comes from whitespace nodes in the source tree that have not been stripped, and the remedy is to add an
Too Little Whitespace
If you want whitespace in the output and aren't getting it, use
This will display perfectly correctly in the browser, but if you want to view the HTML in a text editor, it will be difficult because everything goes on a single line. It would be useful to start a newline after each
element—you can do this as follows:
Another trick I have used to achieve this is to exploit the fact that the non-breaking-space character (
xA0), although invisible, is not classified as whitespace. So you can achieve the required effect by writing:
This works because the newline after the
is now part of a non-whitespace node.
The purpose of this chapter was to study the overall structure of a stylesheet, before going into the detailed specification of each element in Chapter 5. We've now covered the following:
The next chapter describes how to use XSLT stylesheets together with an XML Schema for the source and/or result documents. If you are not interested in using schemas, you can probably skip that chapter and move straight to Chapter 5, which gives detailed information about the data types available in the XDM model and the ways in which you can use them.
Chapter 4
Stylesheets and Schemas
One of the most important innovations in XSLT 2.0 is that stylesheets can take advantage of the schemas you have defined for your input and output documents. This chapter explores how this works.
This feature is an optional part of XSLT 2.0, in two significant ways:
There is no space in this book for a complete description of XML Schema. If you want to start writing schemas, I would recommend you read
XML Schema
by Eric van der Vlist (O'Reilly & Associates, 2002) or
Definitive XML Schema
by Priscilla Walmsley (Prentice Hall, 2002). XML Schema is a large and complicated specification, certainly as large as XSLT itself. However, it's possible that you are not writing your own schemas, but writing stylesheets designed to work with a schema that someone else has already written. If this is the case, I hope you will find the short overview of XML Schema in this chapter a useful introduction.
XML Schema: An Overview
The primary purpose of an XML Schema is to enable documents to be validated: they define a set of rules that XML documents must conform to, and enable documents to be checked against these rules. This means that organizations using XML to exchange invoices and purchase orders can agree on a schema defining the rules for these messages, and both parties can validate the messages against the schema to ensure that they are right. So the schema, in effect, defines a type of document, and this is why schemas are central to the type system of XSLT.
In fact, the designers of XML Schema were more ambitious than this. They realized that rather than simply giving a “yes” or “no” answer, processing a document against a schema could make the application's life easier by attaching labels to the validated document indicating, for each element and attribute in the document, which schema definitions it was validated against. In the language of XML Schema, this document with validation labels is called a Post Schema Validation Infoset, or PSVI. The XDM data model used by XSLT and XPath is based on the PSVI, but it only retains a subset of the information in the PSVI; most importantly, the type annotations attached to element and attribute nodes.
We begin by looking at the kinds of types that can be defined in XML Schema, starting with simple types and moving on to progressively more complex types.
Simple Type Definitions
Let's suppose that many of your messages refer to part numbers, and that part numbers have a particular format such as ABC12345. You can start by defining this as a type in the schema:
Part number is a simple type because it doesn't have any internal node structure (that is, it doesn't contain any elements or attributes). I have defined it by restriction from
, which is one of the built-in types that come for free with XML Schema. I could have chosen to base the type on
, but
is probably better because with
, leading and trailing whitespace is considered significant, whereas with
, it gets stripped automatically before the validation takes place. The particular restriction in this case is that the value must match the regular expression given in the
Having defined this type, you can now refer to it in definitions of elements and attributes. For example, you can define the element:
This allows documents to contain
. Of course, you can also define other elements that have the same type, for example:
Note the distinction between the name of an element and its type. Many element declarations in a schema (declarations that define elements with different names) can refer to the same type definition, if the rules for validating their content are the same. It's also permitted, though I won't go into the detail just yet, to use the same element name at different places within a document with different type definitions.
You can also use the same type definition in an attribute, for example:
You can declare variables and parameters in a stylesheet whose values must be elements or attributes of a particular type. Once a document has been validated using this schema, elements that have been validated against the declarations of
given above, and attributes that have been validated against the declaration named
, will carry the type annotation
, and they can be assigned to variables such as:
The variable
is allowed to contain any element node that has the type annotation
. If further types have been defined as restricted subtypes of
(for example,
), these can be assigned to the variable too. The*
indicates that we are not concerned with the name of the element or attribute, but only with its type.
There are actually three
of simple types that you can define in XML Schema: atomic types, list types, and union types. Atomic types are treated specially in the XPath/XSLT type system, because values of an atomic type (called, naturally enough, atomic values) can be manipulated as freestanding items, independently of any node. Like integers, booleans, and strings, part numbers as defined above are atomic values, and you can hold a part number or a sequence of part numbers directly in a variable, without creating any node to contain it. For example, the following declaration defines a variable whose value is a sequence of three part numbers:
select=“for $p in (‘WZH94623’, ‘BYF67253’, ‘PRG83692’)
return $p cast as part-number”/>
Simple types in XML Schema are not the same thing as atomic types in the XPath data model. This is because a simple type can also allow a sequence of values. For example, it is possible to define the following simple type:
There are actually two type definitions here. The inner type is anonymous, because the
attribute. It defines an atomic value, which must be an
, and more specifically, must be one of the valuesred
, or
. The outer type is a named type (which means it can be referenced from elsewhere in the schema), and it defines a list type whose individual items must conform to the inner type.