Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
But remember that this is likely to work only if the output is serialized by the XSLT processor; it won't work if you write the result to a DOM.
Character Maps as a Substitute for disable-output-escaping
Character maps are less powerful than
disable-output-escaping
, because you can't switch them on and off for different parts of the result tree. But this is also their strength. The problem with
disable-output-escaping
is that it requires some extra information to pass between the transformation engine and the serializer, in addition to the information that's defined in the data model. (As evidence for this, look at the clumsy way that
disable-output-escaping
requests are encoded in a
SAXResult
stream in the Java JAXP interface.) This information is generally lost if you want to pass the result tree to another application before serializing it. The problem gets worse in XSLT 2.0, which allows temporary trees and parentless text nodes to be created and processed within the course of a transformation. One of the difficulties in designing this feature was whether a request to disable output escaping should be meaningful when the data being written was not being passed straight to the serializer, but was being written to a temporary tree or a parentless text node.
Most of the things that can be done with
disable-output-escaping
, including the bad things, can also be done with character maps. The big advantage of character maps is that they don't distort the data model, which means that they don't impact your ability to use a stylesheet-based transformation as a component in an application with clean interfaces to other components.
If you want to convert code that was written to use
disable-output-escaping
to use character maps instead, the most direct approach is to define substitutes for the characters that are changed by XML escaping:
]>
xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”
xmlns:doe=“http://www.wrox.com/xslt/ch15/doe”>
‘&doe-lt;&doe-amp;&doe-gt;&doe-apos;&doe-quot;’)”/>
Then, wherever the existing code uses
Although this mechanical replacement of
disable-output-escaping
by character maps will always work, there may often be better ways of doing it in particular circumstances.
Remember that if you expose the unserialized result tree to another application, it will see the private-use characters such as xE801 in text and attribute nodes.
Summary
This chapter has described the four serialization methods XML, HTML, XHTML, and TEXT, which can be invoked to process the XSLT result tree. It also explained the new XSLT 2.0 facility of character maps, and the XSLT 1.0
disable-output-escaping
capability which it replaces, both of which are there to get you out of sticky corners when the standard serialization mechanisms prove inadequate.
This provides a nice link into the next chapter, which describes the range of techniques that allow vendors and users to extend the capability of XSLT when there is a need to do things that are outside the scope of the standard.
Part III
Exploitation
Chapter 16:
Extensibility
Chapter 17:
Stylesheet Design
Chapter 18:
Case Study: XMLSpec
Chapter 19:
Case Study: A Family Tree
Chapter 20:
Case Study: Knight's Tour
Chapter 16
Extensibility
Previous chapters have discussed standard features of the XSLT language. This chapter discusses what happens when you need to stray beyond the XSLT 2.0 language specification. It's concerned with questions such as:
There is some interesting history here. XSLT 1.0 allowed stylesheets to call user-written extension functions but provided no standard way of writing them. The draft XSLT 1.1 specification defined a general mechanism for creating extension functions written in any language and then defined detailed interfaces for Java and JavaScript (or ECMAScript, to give it its vendor-neutral name). This specification was published as a working draft but was subsequently withdrawn. There were a number of reasons for this, one of which was simply that events were overtaken by the more ambitious XSLT 2.0 initiative. But part of the reason was that the proposals for standardizing extension function interfaces attracted heavy public criticism (see
http://xml.coverpages.org/withdraw-xslScript.html
). It's difficult in retrospect to summarize the arguments that were waged against the idea, but they probably fell into three categories: some people thought extension functions were a bad idea in principle and should not be encouraged, some people disapproved of singling out two languages (Java and JavaScript) for special treatment, and some people felt that the W3C shouldn't be putting language bindings into the core XSLT specification, the job should be done in separate specifications preferably produced by a different organization.
The result of this minor furor is that there is no defined interface for writing extension functions, either in XSLT 1.0 or in XSLT 2.0. However, conventions have emerged at least for XSLT 1.0 (the draft 1.1 specification was influenced by these conventions, and in turn exerted its own influence on the products, despite being abandoned), and it is worth giving these some space.
At the time of writing this edition, only a limited number of XSLT 2.0 processors are available, and it is difficult to see trends emerging as to what capabilities vendors will choose to provide. However, there's no reason to believe that this will be significantly different from the capabilities often found in XSLT 1.0 processors. Some of the examples in this chapter therefore relate to XSLT 1.0 processors such as MSXML from Microsoft and Xalan-J from Apache.
What Vendor Extensions Are Allowed?
The XSLT 2.0 language specification makes no distinction between what vendors are allowed to do, and what users and third parties are allowed to do. For example, it says that the set of languages supported by the
format-date()
function is implementation-defined. This can be interpreted in two ways:
Nowhere in the XSLT specification does it say that implementors must provide facilities for users to define their own extensions. Many implementations will choose to do so, but to find out what extensibility is permitted by the language, we need to look at two things: firstly, the information that is defined to be part of the context or environment, and secondly, the features of the language whose behavior is implementation-defined. There are detailed lists of these features in the W3C specification, but they fall into a few broad categories.
When the specification says that the behavior of a particular feature is
implementation-defined
, this places an onus on the vendor of a conformant product to describe in the product documentation what choices they have made. There are also some features of the language that are
implementation-dependent
: the difference here is that vendors are not expected to document the exact behavior of the product. An example of an implementation-dependent feature is the maximum depth of recursion that is permitted. This will depend on a great many factors outside the software vendor's direct control, so it's not reasonable to expect a definitive statement.
Extension Functions
Extending the library of functions that can be called from XPath expressions has proved to be by far the most important way in which vendors extend the capability of the language, and so we will concentrate most of our attention on this particular extensibility mechanism.
When Are Extension Functions Needed?
There are a number of reasons you might want to call an extension function from your stylesheet:
There are two ways of using extension functions in XSLT. You can write your own extension functions, or you can call extension functions that already exist. These functions might be provided by your XSLT vendor, or they might come from a third-party library such as:
Actually some of these libraries are implemented in XSLT, which means that the functions they contain are not, strictly speaking, extensions at all. But the way you write your code to call them is the same either way, so the distinction isn't really important.
Many XSLT vendors designed their interfaces for Java and JavaScript so that the extensive class libraries available in both these languages would be directly accessible to the stylesheet, with no further coding required. This is certainly true for mathematical functions, string manipulation, and date handling. Which language you choose to use to write extension functions is a matter of personal choice, though it will be heavily constrained by the XSLT processor you are using. With a Java-based processor such as Saxon or Xalan-J, the natural choice is to write extension functions in Java. With Microsoft processors, the natural choice is a .NET language such as C#. If you are using the Gestalt processor, it is probably because your favorite language is Eiffel. Processors written in C or C++ tend to require a more complex procedure for linking extension functions, if they are supported at all.
When Are Extension Functions Not Needed?
There is probably a tendency for newcomers to XSLT to write extension functions simply because they haven't worked out how to code the logic in an XSLT stylesheet function. Slipping back into a programming language you have used for years, rather than battling with an unfamiliar one, is always going to be tempting when you have deadlines to meet. It's understandable, but it's not the right thing to do.
There are other wrong reasons for using extension functions. These include: