XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (694 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
11.45Mb size Format: txt, pdf, ePub

Most of the time, a rule-based stylesheet creates a result tree with a structure similar to the source tree—with most of the source text appearing in the same order in the result document, usually with different tags. The closer this describes the transformation you want to do, the closer your stylesheet will be to the example shown above. However, this doesn't mean that the processing has to be purely sequential. You can process chunks of the tree more than once using modes, you can reorder the nodes of the tree, and you can grab data from ancestor nodes, all without deviating from the rule-based design pattern.

The characteristic feature of a rule-based stylesheet is that there is generally one template rule for each class of object found in the source document. I use the term
class
very loosely here: the “classes of object” might correspond to types in a schema, or to element names, or perhaps to element names qualified by their context or content.

Of course, it's possible to mix design patterns, particularly if your source document contains a mixture of “data-oriented” and “text-oriented” structures (an example might be a job application form). Then it's quite appropriate to use a navigational pattern for the regular structures and a rule-based pattern for the less regular. For example, I created a Web site that provides information about concert soloists. This contains a mixture of structured data (their name, instrument or voice, photo, and contact details), semi-structured data about the performances they have taken part in, and unstructured text. The stylesheet to display the data contains a corresponding mixture of coding styles. The larger and more complex your stylesheet, the more likely it is to contain examples of each of the design patterns.

Computational Stylesheets

Computational stylesheets are the most complex of the four design patterns. They arise when there is a need to generate nodes in the result tree that do not correspond directly to nodes in the source tree. With XSLT 1.0, this happened most commonly when dealing with structure in the source document that is not explicit in its markup. For example:

  • A text field in the source might consist of a comma-separated list of items that is to be displayed as a bulleted list in the output.
  • There might be a need to generate

    elements in the output where a section is not explicit in the source but is defined as comprising an


    element and all its following sibling elements up to the next


    element.

With XSLT 2.0, many of these problems can be tackled using new facilities built into the language: the first of these examples can be handled using

, and the second using

. However, sooner or later you will exhaust the capabilities of these constructs and need to write a stylesheet in the form of a general-purpose program. Examples of such problems include the following:

  • Starting a new page (or other unit) when a running total has reached some threshold value
  • Analyzing a parts explosion to see if it contains any cycles
  • Creating graphical representations of numeric data using the vector graphics standard SVG as the output format

When you write computational stylesheets, you invariably run up against the fact that XSLT does not have an assignment statement, and that it is therefore not possible to write loops in the way you are probably used to in other languages. So you need to understand some of the concepts of
functional programming
, which the following section tries to explain.

Programming without Assignment Statements

Back in 1968, the renowned computer scientist Edsger Dijkstra published a paper under the title
GoTo Statement Considered Harmful
. His thesis, suggesting that programs should be written without
goto
statements, shattered the world as most programmers saw it. Until then they had been familiar with early dialects of FORTRAN and COBOL in which the vast majority of decisions in a program were implemented by using a construct that mapped directly to the conditional jump instruction in the hardware:
if
condition
goto
label
. Even the design notation of the day, the ubiquitous flowchart drawn in pencil using a clear plastic template, represented control flow in this way.

Dijkstra argued that structured programs, written using
if-then-else
and
while-do
constructs instead of
goto
statements, were far less likely to contain bugs and were far more readable and therefore maintainable. The ideas were fiercely controversial at the time, especially among practicing programmers, and for years afterwards the opponents of the idea would challenge the structured programming enthusiasts with arguments of the form, “OK, so how do you do
this
without a
goto
statement?”

Today, however, the battle is won, and the
goto
statement has been consigned to history. Modern languages like Java don't provide a
goto
statement, and we no longer miss it.

But for just as long, there has been another group of enthusiasts telling us that assignment statements are considered harmful. Unlike Dijkstra, these evangelists have yet to convince a skeptical world that they are right, though there has always been a significant band of disciples who have seen the benefits of the approach.

This style of coding, without assignment statements, is called
Functional Programming
. The earliest and most famous functional programming language was Lisp (sometimes ridiculed as “Lots of Irritating Superfluous Parentheses”), while more modern examples include ML, Haskell, and Scheme. (See, for example,
Simply Scheme: Introducing Computer Science
by Brian Harvey and Matthew Wright, MIT Press, 1999.)

XSLT is a language without assignment statements, and although its syntax is very different from these languages, its philosophy is based on the concepts of functional programming. It is not a full-fledged functional programming language because you cannot manipulate functions in the same way as data, but in most other respects, it fits into this category of language. If you want to do anything complicated, you must get used to programming without assignment statements. At first, it probably won't be easy: just as early FORTRAN and COBOL programmers instinctively reached for the
goto
statement as the solution to every problem, if your background is in languages like C or Visual Basic, or even Java, you will just as naturally cherish the assignment statement as your favorite all-purpose tool.

So what's wrong with assignment statements, and why aren't they available in XSLT?

The crux of the argument is that it's the assignment statements that impose a particular order of execution on a program. Without assignment statements, we can do things in any order, because the result of one statement can no longer depend on what state the system was left in by the previous statement. Just as the
goto
statement mirrors the
jump
instruction in the hardware, so the assignment statement mirrors the
store
instruction, and the reason we have assignment statements in our programming languages today is that they were designed to take advantage of sequential von Neumann computers with jump and store instructions. If we want to free ourselves from sequential thinking modeled on sequential hardware architecture, we should find a way of describing what effect we want to achieve, rather than saying what sequence of steps the machine should take in order to achieve it.

The idea of a functional program is to describe the output as a function of the input. XSLT is a transformation language; it is designed to transform an input document into an output document. So, we can regard a stylesheet as a function that defines this transformation: a stylesheet is a function
O=S(I)
where
I
is the input document,
S
is the stylesheet, and
O
is the output document. Recall the statement made by James Clark at the 1995 Paris workshop, which I quoted in Chapter 1, page 28:

A DSSSL stylesheet very precisely describes a function from SGML to a flow object tree.

This concept clearly remained a key part of the XSLT vision throughout the development of the language. (And, indeed, the flow objects of DSSSL [Document Style Semantics and Specification Language] eventually became the Formatting Objects of XSL-FO.)

We're using the word
function
here in something close to its mathematical sense. Languages like FORTRAN and Visual Basic have borrowed the word to mean a subroutine that returns a result, but the mathematical concept of a function is not that of an algorithm or sequence of steps to be performed, rather it is a statement of a relationship. The square-root function defines a relationship between 3 and 9, namely
3=sqrt(9)
. The essence of a function is that it is a fixed, constant, reliable relationship, and evaluating it doesn't change the world. When you ask me “what's the square root of 9 if you work it out?” I can honestly reply “exactly the same as if I don't.” I can say this because square root is a
pure
function; it gives the same answer whoever calls it and however often they call it, and calling it once doesn't change the answer it gives next time; in fact, it doesn't change anything.

The nice property of pure functions is that they can be called any number of times, in any order, and produce the same result every time. If I want to calculate the square root of every integer between zero and a thousand, it doesn't matter whether I start at zero and work up, or start at a thousand and work down, or whether I buy a thousand and one computers and do them all at the same time; I know I will get the same answer. Pure functions have no side effects.

An assignment statement isn't like that. The effect of an assignment statement “if you work it out” is
not
the same as if you don't. When you write
x=x+1;
(a construct, incidentally, which most of us found completely absurd when we were first introduced to programming), the effect depends very much on how often the statement is executed. When you write several assignment statements, for example:

Other books

Forbidden Entry by Sylvia Nobel
Fire And Ash by Nia Davenport
A Good Man by Guy Vanderhaeghe
Straight Life by Art Pepper; Laurie Pepper
London Boulevard by Bruen, Ken
The Society of Dread by Glenn Dakin
Love on Stage by Neil Plakcy
Sirius by Olaf Stapledon