Read XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition Online
Authors: Michael Kay
Static and Dynamic Type Checking
As I said in the introduction to this chapter, one of the major purposes of the type system in a programming language is to enable programming errors to be detected and corrected. The best time to do this, where possible, is at compile time.
Very often, you will compile and execute an XSLT stylesheet, or an individual XPath expression, as a single indivisible operation. You may therefore feel that there isn't much difference between detecting an error at compile time and detecting it at runtime. Indeed, if you use XPath expressions from a programming language such as Java, it's likely that the XPath expressions won't be compiled until the Java program is executed, so in a sense all errors become runtime errors. However, there is still a big difference, because an error that's detected at compile time doesn't depend on the input data. This means that it will be reported every time you process the XPath expression, which means it can't remain lurking in the code until some chance condition in the data reveals a latent bug that got through all your tests.
I had a real-life example of this recently. In Chapter 20, there is a stylesheet whose task is to perform a knight's tour of the chessboard: a tour, starting from a user-specified square, in which the knight visits every square on the chessboard exactly once. I published an XSLT 1.0 version of this stylesheet in an earlier edition of this book, and I have also written an XQuery 1.0 version, which is published with the Saxon software distribution. Part of the algorithm involves backtracking when the knight gets stuck in a blind alley; however, I never found a way of testing the backtracking, because in every case I tried, the knight got all the way around the board without ever getting stuck. In fact, I said in the book that although I couldn't prove it, I believed that the backtracking code would never be invoked.
Three years after I first wrote the code, one of my readers discovered that if the knight starts on square
f1
, it gets stuck on move 58 and has to retrace its steps. The same user has since reported that this is the only starting square where this happens. The way he made the discovery was that in the XQuery version of the algorithm, the backtracking code was wrong. I had coded two arguments to a function call the wrong way around, and when the function call was executed, this was detected, because one of the values had the wrong type. So type checking detected the error, but static type checking (that is, compile time checking) could potentially have detected it three years earlier.
But static type checking also has a downside: it makes it much harder to cope with unpredictable data. With strict static type checking, every expression must satisfy the compiler that it can never fail at runtime with a type error. Let's see what happens if, for example, you have a
price
attribute whose value is either a decimal number, or the string
N/A
. You can define this in XML Schema as follows:
Now let's suppose that you want to find the average price of those products where the price is known. Your first attempt might look like this:
avg( product/@price[. != “N/A”] )
This looks sensible, but under strict static type checking, it will fail to compile. There are two reasons. Firstly, you can't compare a number with a string, so the expression
. != “N/A”
isn't allowed, on the grounds that the value of
.
(that is, the typed value of the
price
attribute) might be a number. Secondly, although you and I can tell that all the attributes that get through the filter in square brackets will be numeric, the compiler isn't so clever, and will report an error on the grounds that some of the items in the sequence being averaged might be strings rather than numbers.
The first of these two errors will be reported even if type checking is delayed until runtime, so in this case the static type checker has done us a service by reporting the error before it happened. The second error is a false alarm. At runtime, all the attribute values being averaged will actually be numeric, so the error of including a string in the sequence will never occur.
This example is designed to illustrate that static type checking is a mixed blessing. It will detect some errors early, but it will also report many false alarms. The more you are dealing with unpredictable or semi-structured data, the more frequent the false alarms will become. With highly structured data, static type checking can be a great help in enabling you to write error-free code; but with loosely structured data, it can become a pain in the neck. Because XML is designed to handle such a wide spectrum of different kinds of data, the language designers therefore decided that static type checking should be optional.
Whether you use static or dynamic type checking, the first error in our example above will need to be corrected. One way to do this is to force the value of the attribute to be converted to a string before the comparison, like this:
avg(product/@price[string(.) != “N/A”])
For the other error (the false alarm) we don't need to take any further action in the case of a system that only does dynamic type checking. However, if we want the expression also to work with systems that do static type checking, we will need to change it. The simplest approach seems to be:
avg(product/xs:decimal(@price[string(.) != “N/A”]))
The cast to
xs:decimal
here doesn't actually do anything at runtime, because the operand will always be an
xs:decimal
already. But it keeps the static type checker happy, because the system can now tell at compile time that the values input to the
avg()
function will all be
xs:decimal
values.
Looking back at the example:
avg(product/@price[. != “N/A”])
it might have occurred to you that under XPath 1.0, apart from the fact that the
avg()
function was not available, this would have worked quite happily, with neither static or dynamic errors. That's because XPath 1.0 treated all data in source documents as being untyped. You could compare the value of an attribute to a string, and it would treat it as a string, and you could then take an average, and it would treat the same value as a number. You can do the same thing in XPath 2.0, simply by switching off schema processing: if there is no schema, or if you switch off schema processing, then the attributes are going to be treated as
xs:untypedAtomic
values, and will adapt themselves to whatever operation you want to perform, just as with XPath 1.0. If you like this way of working, there is nothing to stop you carrying on this way. However, you should be aware of the consequences: many programming errors in XPath 1.0 go undetected, or are very difficult to debug, because the system in effect tries to guess what you meant, and it sometimes guesses wrong. For example, if you compare a string to a number using the
=
operator, XPath 1.0 guesses that you wanted a string comparison (so
4 = “04”
is
false
), while if you compare a string to a number using the
<=
operator, XPath 1.0 guesses that a numeric comparison was intended (so
4 <= “04”
is
true
). Sooner or later, this is going to trip you up. With a schema-aware XPath 2.0 processor, you have to be explicit about whether you want a string comparison or a numeric comparison, by explicitly converting one of the operands to the type of the other.