XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (120 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
6.26Mb size Format: txt, pdf, ePub

The regex must not be one that matches a zero-length string. This rules out values such as
regex=“”
or
regex=“[0-9]*”
. The reason for this rule is that languages such as Perl have different ways of handling this situation, none of which are completely satisfactory, and which are sensitive to additional parameters such as
limit
, which XSLT chose not to provide.

The input string is formed by evaluating the
select
expression, and the processor then analyzes this string to find all substrings that match the regex. The substrings that match the regex are processed using the instructions within the

element, while the intervening substrings are processed using the instructions in the

element. For example, if the regex is
[0-9]+
, then any consecutive sequence of digits in the input string is passed to the

element, and any consecutive sequence of non-digits is passed to the

element.

Within the

or

element, the substring in question can be referenced as the context item, using the XPath expression
.
. It is also possible within the

element to refer to the substrings that matched particular parts of the regex: see
Captured Groups
below.

Because the instruction changes the context item, it's often useful to bind a variable to the context node before entering the instruction, so that you can refer to it within the

and

elements. If you forget to do this, a likely consequence is an error message along the lines “the context item is not a node”.

Neither a matching substring nor a nonmatching substring will ever be zero-length. This means that if two matching substrings are adjacent to each other in the input string, there will be two consecutive calls on the

element, with no intervening call on the

element.

Omitting either the

element or the

element causes the relevant substring to be discarded (no output is produced in respect of this substring).

In working its way through the input string, the processor always looks for the first match that it can find. That is, it looks first for a match starting at the first character of the input string, then for a match starting at the second character, and so on. There are several situations that can result in several candidate matches occurring at the same position (that is, starting with the same character in the input). The rules that apply are:

  • The quantifiers
    *
    and
    +
    are
    greedy
    : They match as many characters as they can, consistent with the regular expression as a whole succeeding. For example, given the input
    Here [1] or there [2]
    , the regex
    \[.*\]
    will match the string
    [1] or there [2]
    .
  • The quantifiers
    *?
    and
    +?
    are
    non-greedy
    : They match as few characters as they can, consistent with the regular expression as a whole succeeding. For example, given the input
    Here [1] or there [2]
    , the regex
    \[.*?\]
    will match the strings
    [1]
    and
    [2]
    .

Other books

Embers at Galdrilene by A. D. Trosper
Rustication by Charles Palliser
Chance of a Lifetime by Jodi Thomas
Into Darkness by Richard Fox
Gabriel's Horn by Alex Archer
Past Reason Hated by Peter Robinson
The Leopard Unleashed by Elizabeth Chadwick