XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition (554 page)

BOOK: XSLT 2.0 and XPath 2.0 Programmer's Reference, 4th Edition
11.95Mb size Format: txt, pdf, ePub
ads

Signature

Argument
Type
Meaning
value
xs:string
The input string, to which percent-encoding is to be applied
Result
xs:string
The percent-encoded string

Effect

The result string is formed from the input string by escaping special characters according to the rules defined in RFC 3986, (
http://www.ietf.org/rfc/rfc3986.txt
). Special characters are escaped by first encoding them in UTF-8, then representing each byte of the UTF-8 encoding in the form %HH where HH represents the byte as two hexadecimal digits. The digits A–F are always in upper case.

All characters are escaped except the following:

  • A-Z a-z 0-9
  • hyphen
    -
    , underscore
    _
    , period
    .
    , and tilde

Examples

Expression
Result
encode-for-uri(“simple.xml”)
“simple.xml”
encode-for-uri(“my doc.xml”)
“my%20doc.xml”
encode-for-uri(“f+o.pdf”)
“f%2Bo.pdf”
encode-for-uri(“Grüße.html”)
“Gr%C3%BC%C3%9Fe.html”

Usage

This function is designed for use by applications that need to construct URIs.

The rules for URIs (given in RFC3986,
http://www.ietf.org/rfc/rfc3986.txt
) make it clear that a string in which special characters have not been escaped is not a valid URI. In many contexts where URIs are required, both in XPath functions such as the
doc()
function and in places such as the
href
attribute of the

element in HTML, the URI should in theory be fully escaped according to these rules. In practice, software is very often tolerant and accepts unescaped URIs, but applications shouldn't rely on this.

The rules for escaping special characters (officially called
percent-encoding
) are rather peculiar. To escape a character, it is first encoded in UTF-8, which in general represents a character as one or more octets (bytes). Each of these bytes is then substituted into the string using the notation
%HH
, where
HH
is the value of the byte in hexadecimal. For example, the space character is represented as
%20
, and the euro symbol as
%E2%82%AC
. Although RFC 3986 allows the hexadecimal digits
A-F
to be in either upper or lower case, the
encode-for-uri()
function mandates upper case, to ensure that escaped URIs can be compared as strings.

Other books

Hell's Heart by John Jackson Miller
The Unidentified by Rae Mariz
Punish Me with Kisses by William Bayer
Over the Line by Emmy Curtis
Breath of Air by Katie Jennings
Judgment on Deltchev by Eric Ambler
Cursed by Tara Brown
Alentejo Blue by Monica Ali
It Gets Better by Dan Savage