XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (686 page)

Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online

Authors: Michael Kay

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition

7.12Mb size Format: txt, pdf, ePub

But remember that this is likely to work only if the output is serialized by the XSLT processor; it won't work if you write the result to a DOM.

Character Maps as a Substitute for disable-output-escaping

Character maps are less powerful than
disable-output-escaping
, because you can't switch them on and off for different parts of the result tree. But this is also their strength. The problem with
disable-output-escaping
is that it requires some extra information to pass between the transformation engine and the serializer, in addition to the information that's defined in the data model. (As evidence for this, look at the clumsy way that
disable-output-escaping
requests are encoded in a
SAXResult
stream in the Java JAXP interface.) This information is generally lost if you want to pass the result tree to another application before serializing it. The problem gets worse in XSLT 2.0, which allows temporary trees and parentless text nodes to be created and processed within the course of a transformation. One of the difficulties in designing this feature was whether a request to disable output escaping should be meaningful when the data being written was not being passed straight to the serializer, but was being written to a temporary tree or a parentless text node.

Most of the things that can be done with
disable-output-escaping
, including the bad things, can also be done with character maps. The big advantage of character maps is that they don't distort the data model, which means that they don't impact your ability to use a stylesheet-based transformation as a component in an application with clean interfaces to other components.

If you want to convert code that was written to use
disable-output-escaping
to use character maps instead, the most direct approach is to define substitutes for the characters that are changed by XML escaping:

xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”

xmlns:doe=“http://www.wrox.com/xslt/ch15/doe”>

‘&doe-lt;&doe-amp;&doe-gt;&doe-apos;&doe-quot;’)”/>

Then, wherever the existing code uses

, change it to say

. This will replace the characters that are normally escaped by their substitutes, and the substitutes will be turned back into the unescaped original characters during serialization, by virtue of the character map.

Although this mechanical replacement of
disable-output-escaping
by character maps will always work, there may often be better ways of doing it in particular circumstances.

Remember that if you expose the unserialized result tree to another application, it will see the private-use characters such as xE801 in text and attribute nodes.

Summary

This chapter has described the four serialization methods XML, HTML, XHTML, and TEXT, which can be invoked to process the XSLT result tree. It also explained the new XSLT 2.0 facility of character maps, and the XSLT 1.0
disable-output-escaping
capability which it replaces, both of which are there to get you out of sticky corners when the standard serialization mechanisms prove inadequate.

This provides a nice link into the next chapter, which describes the range of techniques that allow vendors and users to extend the capability of XSLT when there is a need to do things that are outside the scope of the standard.

Part III

Exploitation

Chapter 16:
Extensibility

Chapter 17:
Stylesheet Design

Chapter 18:
Case Study: XMLSpec

Chapter 19:
Case Study: A Family Tree

Chapter 20:
Case Study: Knight's Tour

Chapter 16

Extensibility

Previous chapters have discussed standard features of the XSLT language. This chapter discusses what happens when you need to stray beyond the XSLT 2.0 language specification. It's concerned with questions such as:

What extensions are vendors allowed to provide?
How much are implementations allowed to vary from each other?
How can you write your own extensions?
How can you write stylesheets that will run on more than one vendor's XSLT processor?

There is some interesting history here. XSLT 1.0 allowed stylesheets to call user-written extension functions but provided no standard way of writing them. The draft XSLT 1.1 specification defined a general mechanism for creating extension functions written in any language and then defined detailed interfaces for Java and JavaScript (or ECMAScript, to give it its vendor-neutral name). This specification was published as a working draft but was subsequently withdrawn. There were a number of reasons for this, one of which was simply that events were overtaken by the more ambitious XSLT 2.0 initiative. But part of the reason was that the proposals for standardizing extension function interfaces attracted heavy public criticism (see
http://xml.coverpages.org/withdraw-xslScript.html
). It's difficult in retrospect to summarize the arguments that were waged against the idea, but they probably fell into three categories: some people thought extension functions were a bad idea in principle and should not be encouraged, some people disapproved of singling out two languages (Java and JavaScript) for special treatment, and some people felt that the W3C shouldn't be putting language bindings into the core XSLT specification, the job should be done in separate specifications preferably produced by a different organization.

The result of this minor furor is that there is no defined interface for writing extension functions, either in XSLT 1.0 or in XSLT 2.0. However, conventions have emerged at least for XSLT 1.0 (the draft 1.1 specification was influenced by these conventions, and in turn exerted its own influence on the products, despite being abandoned), and it is worth giving these some space.

At the time of writing this edition, only a limited number of XSLT 2.0 processors are available, and it is difficult to see trends emerging as to what capabilities vendors will choose to provide. However, there's no reason to believe that this will be significantly different from the capabilities often found in XSLT 1.0 processors. Some of the examples in this chapter therefore relate to XSLT 1.0 processors such as MSXML from Microsoft and Xalan-J from Apache.

What Vendor Extensions Are Allowed?

The XSLT 2.0 language specification makes no distinction between what vendors are allowed to do, and what users and third parties are allowed to do. For example, it says that the set of languages supported by the
format-date()
function is implementation-defined. This can be interpreted in two ways:

Vendors can support as many or as few languages as they think their target market requires.
Vendors are allowed (but not required) to provide localization mechanisms that enable users or third parties to extend the set of supported languages.

Nowhere in the XSLT specification does it say that implementors must provide facilities for users to define their own extensions. Many implementations will choose to do so, but to find out what extensibility is permitted by the language, we need to look at two things: firstly, the information that is defined to be part of the context or environment, and secondly, the features of the language whose behavior is implementation-defined. There are detailed lists of these features in the W3C specification, but they fall into a few broad categories.

Some features of the language are optional, in the sense that conformant processors are not required to provide them. For example, a processor can choose not to implement schema-aware processing, and it can choose not to implement the
disable-output-escaping
attribute or the namespace axis.
Interfaces between the XSLT processor and the outside world are generally implementation-defined. This includes the mechanisms for invoking the XSLT processor and delivering its results, the mechanism for reporting errors, and the details of how URIs are interpreted in constructs such as

,

,

, and the
document()
and
doc()
functions.
The XSLT vocabulary is extensible in five key areas. In each of these cases, the vendor can extend the vocabulary and, if they wish, they can also enable users or third parties to extend it:
- Extension functions:
  The set of functions that can be called from XPath expressions, and any mechanisms for adding additional functions, are implementation-defined, as long as any functions outside the language-defined core are in a separate namespace.
- Extension instructions:
  The set of instructions that can appear in a sequence constructor is extensible, as long as the namespace used for any extension instructions is declared in the stylesheet in an
  extension-element-prefixes
  attribute.
- Extension attributes:
  Additional attributes can be added to any XSLT element, as long as they are in a separate namespace. There are rules limiting the effect that such attributes may have: essentially, they must not change the result of the transformation except to the extent that the W3C specification leaves the result explicitly implementation-defined.
- Extension declarations:
  Additional top-level declarations can be defined in the stylesheet, provided that the element name is in a separate namespace. These are subject to the same constraints as extension attributes.
- Extension types:
  Additional types can be made available. This feature is defined primarily so that extension functions can return application-oriented objects (for example, a
  sql:connect()
  function might return an object of type
  sql:DatabaseConnection
  ), but there are no limits on how the facility might be used.
The set of collations that can be used for sorting and comparing strings is implementation-defined.
Many localization attributes, for example those used to control the formatting of dates and numbers, have an implementation-defined range of possible values.

When the specification says that the behavior of a particular feature is
implementation-defined
, this places an onus on the vendor of a conformant product to describe in the product documentation what choices they have made. There are also some features of the language that are
implementation-dependent
: the difference here is that vendors are not expected to document the exact behavior of the product. An example of an implementation-dependent feature is the maximum depth of recursion that is permitted. This will depend on a great many factors outside the software vendor's direct control, so it's not reasonable to expect a definitive statement.

Extension Functions

Extending the library of functions that can be called from XPath expressions has proved to be by far the most important way in which vendors extend the capability of the language, and so we will concentrate most of our attention on this particular extensibility mechanism.

When Are Extension Functions Needed?

There are a number of reasons you might want to call an extension function from your stylesheet:

You might want to get data held externally, perhaps in a database or in an application.
You may need to access system services that are not directly available in XSLT or XPath. For example, you might want to use a random number generator, or append a record to a log file.
You might want to perform a complex calculation that is cumbersome to express in XSLT, or that performs poorly. For example, if you are generating SVG graphics, you might need to use trigonometric functions such as
sin()
and
cos()
. This situation arises far less with XSLT 2.0 than it did in 1.0, because the core function library is so much richer, especially in its ability to do string manipulation and date/time arithmetic. But if the function you need is out there in some Java library, it's no crime to call it.
A more questionable use of external functions is to get around the “no side effects” rule in XSLT, for example to update a counter. Avoid this if you can; if you need such facilities, then you haven't yet learned to think about solving problems in the way that is natural for XSLT. More on this in the next chapter.

There are two ways of using extension functions in XSLT. You can write your own extension functions, or you can call extension functions that already exist. These functions might be provided by your XSLT vendor, or they might come from a third-party library such as:

Dimitre Novatchev's FXSL library at
http://fxsl.sourceforge.net/
. This library concentrates on providing the primitives needed for higher-order programming, and uses them to provide a basic set of operations equivalent to those found in languages such as Haskell. There are some interesting demonstrations of how these can be used to solve practical programming problems.
Priscilla Walmsley's FunctX library at
http://www.xsltfunctions.com
. This library, available in both XSLT and XQuery forms, provides a remarkably extensive collection of utility functions for manipulating strings, numbers, dates, node sequences, and more.
The EXSLT library found at
http://www.exslt.org/
(many EXSLT functions provide capabilities that are no longer needed in 2.0, but some of them, such as the mathematical functions, are still very relevant).

Actually some of these libraries are implemented in XSLT, which means that the functions they contain are not, strictly speaking, extensions at all. But the way you write your code to call them is the same either way, so the distinction isn't really important.

Many XSLT vendors designed their interfaces for Java and JavaScript so that the extensive class libraries available in both these languages would be directly accessible to the stylesheet, with no further coding required. This is certainly true for mathematical functions, string manipulation, and date handling. Which language you choose to use to write extension functions is a matter of personal choice, though it will be heavily constrained by the XSLT processor you are using. With a Java-based processor such as Saxon or Xalan-J, the natural choice is to write extension functions in Java. With Microsoft processors, the natural choice is a .NET language such as C#. If you are using the Gestalt processor, it is probably because your favorite language is Eiffel. Processors written in C or C++ tend to require a more complex procedure for linking extension functions, if they are supported at all.

When Are Extension Functions Not Needed?

There is probably a tendency for newcomers to XSLT to write extension functions simply because they haven't worked out how to code the logic in an XSLT stylesheet function. Slipping back into a programming language you have used for years, rather than battling with an unfamiliar one, is always going to be tempting when you have deadlines to meet. It's understandable, but it's not the right thing to do.

There are other wrong reasons for using extension functions. These include:

Believing that an XSLT implementation of the logic is bound to be slower
: Don't believe this until you have proved it by measurement—and don't let it influence you unless you need the extra performance. I did a quick test to compare the FXSL code for calculating square roots (to four decimal places) using pure XSLT with a call to Java. This is a worst-case scenario because it's very computation-intensive. Using FXSL took around 1900µs per call, while calling Java took 12µs. So there's a significant difference, but the question is, does it matter? Is that 1900µs going to be noticeable on the bottom line, and is it worth the cost of making your stylesheet processor-dependent?
Supplying external data to the stylesheet
: The best way to supply information to the stylesheet is in the form of a stylesheet parameter. Another good way is to provide the data in the form of an XML document, in response to a call on the
document()
function (many processors allow you to write logic that intercepts the URI supplied to the
document()
function, or you could use a URI that invokes a servlet or a Web service).
Achieving side effects
: There are some side effects that are reasonably acceptable, for example writing messages to a log file—these are basically actions that do not affect the subsequent processing of the stylesheet, so the order of events is not critically important. But trying to get round the no-side-effects rule in other ways is nearly always the wrong thing to do, though it can be very tempting. Sooner or later, the optimizer will rearrange your code in a way that stops your extension function from working.
Using XSLT as a job control language
: I have seen stylesheets that consist entirely of calls to external services, effectively using XSLT as a scripting language to invoke a sequence of external tasks. XSLT wasn't designed for this role, and the fact that order of execution in XSLT is undefined makes it a poor choice of tool for this job. Use an XML pipeline processor (XProc), a shell script language, or the
ant
utility.

Other books

Dollenganger 01 Flowers In the Attic by V. C. Andrews

The Hard Way (Box Set) by Stephanie Burke

Rockefeller – Controlling the Game by Jacob Nordangård

Colby Velocity by Debra Webb

The Girl & the Machine by Beth Revis

Betrayal by Gillian Shields

The Incubus, Succubus and Son of Perdition Box Set: The Len du Randt Bundle by du Randt, Len

Black Hawk Down by Bowden, Mark

Mortal Faults by Michael Prescott

Sellout by Ebony Joy Wilkins