XPath and XPointer/XPointer Background

From WikiContent

< XPath and XPointer
Revision as of 09:45, 7 March 2008 by Docbook2Wiki (Talk)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search
XPath and XPointer

This chapter introduces the concepts and language of the XPointer 1.0 family of specifications. XPointer 1.0 Candidate Recommendation was a single document in September 2001 but was fragmented into a group of smaller specifications in July 2002. Most of these specifications were published as Last Call Working Drafts and are likely to move forward through the W3C process without significant changes, though the largest of them, the XPointer xpointer( ) Scheme, may have a longer process ahead.

To quickly rehash some of the material covered in Chapter 1, the purpose of the XPointer specifications is to provide a formal mechanism for identifying fragments of an XML document. This is accomplished, just as it was with (X)HTML, by appending to a standard URI a pound sign/hash character, #, followed by the fragment identifier itself — the XPointer, in this case. Just as with "regular" URIs, a URI including an XPointer may be relative instead of absolute even beginning with the #. In such cases, what precedes the # still exists after a fashion; it's implied, based on the context in which the URI is found.

The latest version of XPointer includes four specifications:

The Framework provides a foundation describing how XPointer works as a whole and defines the simplest kind of XPointer: a shorthand pointer. The other specifications define particular pieces of more complex XPointers, providing support for XML namespaces (xmlns), a simple element-structure pointer (element), and the full-strength XPath-based approach of the original XPointer work (xpointer).

Considering all the ground it has to cover, the XPath 1.0 Recommendation (the subject of the first part of this book) is a remarkably concise document (the hardcopy from my laser printer clocks it at a mere 29 pages). The XPointer 1.0 specifications, by contrast, seem on their face to be tackling a much simpler subject but in many more pages — 42, to be exact, 22 of which cover just the xpointer( ) scheme. Why the disparity?

XPointer's subject matter is not self-contained but rather spills over into and from other areas of Internet technology. First, of course, there's the need to describe its relationship with XPath. Second, XPointers can be used in XLink documents, so that interface must also be satisfied. Third, there's some duplication of information. And finally, using XPointers requires considering certain standards that are the domain not of the W3C but of the Internet Engineering Task Force (IETF): media types.


XPointer and Media types

The general purpose of any low-level hyperlinking standard, such as XPointer, is the location and retrieval of content to be found at some location other than that of the reference to the content. Most such standards confine themselves to content of a particular type; for instance, an XHTML document's a (anchor) elements can easily locate (X)HTML content but not a bookmark in a Microsoft Word document. XPointer is designed expressly to locate fragments of XML documents' content.

Now, over the Internet, content types — media types — are identified not by such more or less obvious cues as filename extensions or, for that matter, even the nature of the content itself (e.g., "This document begins with an XML declaration, therefore, it must be an XML document"). Rather, what determines the media type of a given resource is how the resource is delivered over the network by its server. If a server broadcasts a stream of what looks like XHTML but declares it to be mere text, an application receiving that stream is supposed to treat it as text.

Of course, nothing is ever that simple. In the first place, how a given data stream is served depends on the proper configuration — by an imperfect human server administrator — of the server itself. If that person hasn't instructed the server how to serve an XHTML document, for instance, it may be transmitted to the network not as XHTML but as plain text.

At the other end of the pipe, client applications bear the burden of interpreting media types "correctly," regardless of how "correctly" they're being served. A browser vendor's product, in theory, should recognize as (X)HTML only documents that are being served as such and not jump to conclusions based on possibly faulty guesswork about the filename extension, whether the first character is a left angle bracket, and so on. This kind of guesswork both contributes to the bloated size of client software and risks subverting the intention of the resource's nominal "owner," the server. (On the other hand, of course, it also makes clients such as web browsers seem more robust than they might otherwise.)


For details about working with XML media types, see the IETF Request for Comment (RFC) 3023 at http://www.ietf.org/rfc/rfc3023. Note that this RFC does not require the use of XPointers to locate XML fragments, although it does mention the W3C work in progress.

Media types are commonly identified in two parts, the top-level type and the subtype, separated by a slash. The top-level type is the general class of data being delivered; the subtype, the specific form of that class of data.

In the case of the XML resources that XPointer can locate, the XPointer Framework specification says that XPointer addresses documents of any of the following four media types: text/xml, application/xml, text/xml-external-parsed-entity, and application/xml-external-parsed-entity. As a general rule, the text top-level media types identify resources that in normal everyday use, would be readable by humans; application media types identify resources meant to be processed by software.

RFC 3023 recommends the use of application media types rather than text for serving XML, in the absence of reasons to the contrary. If the specific subtype — xml or xml-external-parsed-entity, in this case — is unknown to the client application, it will be treated in a generic manner. Thus, human readers of an XML document served as text over the Web, for example, might be treated to a stream of angle brackets, PIs, and the like. Not many such readers would welcome this kind of display.

(Note that the RFC also defines a fifth XML media type, application/xml-dtd, for serving and identifying DTD-type entities. XPointers into DTDs are not possible.)

Some Definitions

The XPointer specifications, like most W3C documents, are very careful to use terminology in well-constrained ways. This can lead to confusion, ironically, because some of these terms are used in loose form in everyday English. That said, let's look at some of the most important such terms as they appear in the standard.


Unlike some of the rest of the terms in this section, "resource" is not defined directly. Instead, under its definition of subresources, the previous version of the XPointer spec said, "...the whole resource being referred to is an XML document or external parsed entity."[1] That is, a resource is the entity within which some specific content resides.

Note that XPointer uses "resource" rather differently than the way it's used in "Uniform Resource Identifier" — the familiar URI of the Web. The authoritative reference for URIs is IETF RFC 2396 ("Uniform Resource Identifiers (URI): Generic Syntax," located at http://www.ietf.org/rfc/rfc2396.txt), which defines the term in this way (emphasis added):

A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources.The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity that corresponds to that mapping at any particular instance in time. Thus, a resource can remain constant even when its content — the entities to which it currently corresponds — changes over time, provided that the conceptual mapping is not changed in the process.

Thus, in a URI such as http://www.example.org/index.html, the resource is not the document so pointed to but the address that points to it. In practice, XPointer works differently: if a given XPointer is to work at all, it must be applied to a portion of a particular XML document, not just to the address of whatever document happens to be there at the time. (On the other hand, one might be excused for wishing that a resource would always be a resource, regardless of the context in which she encounters the term.)


A subresource is the specific fragment of content identified by the XPointer. This may, but will almost never, be an entire document; it's far more likely, because XPointer uses XPath, that the subresource in question will be a single element, or an attribute, or some other (perhaps more complex) node-set. (As you'll see in a moment, however, XPointer expands on XPath's notion of node-sets, in the form of location-sets.)


If the subresource is the what — the content — identified by an XPointer, the location is the where.

(I wouldn't be surprised to see the terms subresource and location being used interchangeably, although this would make purists crazy. If I'm standing at a stove preparing a meal and ask you to "hand me that can," I of course hope you will not empty the can first — that you will deliver it to me contents and all. In the same way, people exchanging URIs often say that someone "sent me a web page"... even though, if anything, they are being sent to the page rather than the other way around.)

The spec makes an analogy between the term "node" as used in XPath and the term "location" as used in XPointer. Why then do we need a new term? Because XPointer can handle (via XPath) not only nodes, but two other forms of addressing: points and ranges. Both of these terms are discussed in the sections that follow.


As an XPointer location is to an XPath node, an XPointer location-set is to an XPath node-set. That is, a location-set is the complete list of locations identified by an XPointer. (And as in the XPath counterparts, if an XPointer has only a single location, it might just as well be considered a location-set containing only a single member.)

Note that the addition of the point and range location types requires enhancing the concept of document order established by XPath. This subject is addressed further in Chapter 9.


A point in XPointer terms is the abstract location between two adjacent chunks of text content. Figure 7-1 illustrates.

Figure 7-1. Point-type locations

Point-type locations

Ignoring the newline characters and other whitespace added for legibility, Figure 7-1 highlights the following point-type locations (among many others present in the document):

  • Immediately following the gerund element's start tag (before the capital B)
  • Immediately following the emph element's start tag (before the p in prime)
  • Between the m and y in my

Point-type locations also exist at the interstices between the element boundaries. For instance, although not highlighted in Figure 7-1, there's a point-type location following the close of the gerund element. Just as important, though, are the places where there are no point-type locations at all. For instance, there is no point-type location between any adjacent characters in the name of an element in the element's start tag.


An XPointer range is the entire block of text content bounded by two points. In Figure 7-1, each individual word in the document exists in a range between the points immediately following and preceding the containing element's start and end tags, respectively. Each individual character also exists in its own range (the points in question immediately precede and follow the character). Note that a point alone doesn't "contain" anything; it must be paired in a range with another point first.

There's no particular requirement that a range be limited to a single containing element. For instance, you could establish a range by fixing one point before the p in prime and another point after the n in senility. The resulting range (again, ignoring whitespace) would be primeofmysen.

Points and Ranges: Flattening the Logical Hierarchy

One interesting aspect of these two XPointer location types, as opposed to the node types available under XPath, is that points and ranges more closely reflect a document's physical characteristics, not its logical or structural ones. There's no hierarchical containment going on in a range (except in the limited sense of "text contained within two points").

This difference is reflected in the common applications to which ranges and points might be put to use. In particular, they're useful as a way of formally expressing the physical act of selecting text, such as with a mouse or with keyboard commands. XPath nodes, on the other hand — while they of course are defined by a given document's physical characteristics, sequences of characters, and such — are purely logical constructs with no physical user-interface counterparts.

On the other hand, because points can't be located within certain portions of the markup (such as in element names), a range can't span just any text physically present in the document — as if you had the source code open in a text editor. What you can select is still constrained by certain features of the logical node tree.

You'll learn much more about points and ranges in the rest of this chapter and the two that follow, especially Chapter 9.

The Framework

The XPointer Framework provides a basic set of rules for creating fragment identifiers. The original XPointer specifications defined a single set of rules for processing XPointers, though later versions offered two levels of conformance. The new framework takes a different approach, defining a more extensible system with a tiny core and an extensible scheme mechanism.

Rather than having to conform to a particular set of rules about how to identify fragments, XPointer processors now have to identify which schemes they support, and then deal with conformance at the scheme level. Error handling has also been generalized, with some syntax errors defined at the Framework level and other errors defined in particular schemes.

The W3C has provided three schemes beyond the XPointer Framework. One of these schemes, xmlns( ), is designed to provide namespace context for other schemes, effectively declaring namespaces for use in other parts of the XPointer. The element( ) scheme can be used to describe locations inside documents by walking the document tree from the root element (or from an ID value in the document) to the desired location. The most ambitious of the schemes, the xpointer( ) scheme, builds on XPath to create a mechanism for locating content that extends XPath's node-based mechanisms to support locations that cross node boundaries.


An xpath( ) scheme that stuck to XPath 1.0 might also be useful, but the W3C hasn't defined such a scheme, preferring to use xpointer( ) for that.

All XPointer processors will have to support the shorthand pointer syntax, but beyond that, developers will have to know what kind of XPointer processor their documents are going to face.

Error Types

Like any application working on information on the Web, XPointers face a number of potential error conditions. The XPointer specifications describe a variety of errors, but they all fall into the three rough categories.

Syntax Errors

Syntax errors are to an XPointer processor what well-formedness errors are to an XML processor — or, indeed, what syntax errors are to the compiler of a traditional programming language: they violate the rules of the language itself. In XPointer as in other languages meant for machine processing, when the standard says "A given left parenthesis must be balanced by a right parenthesis and vice versa," it means just that. You can't "balance" a left parenthesis with a square bracket, an asterisk, or the digit 9. If an XPointer fails syntactically, it doesn't make any difference if it falls into one of the other error classifications — the processor won't even be able to figure it out. Thus, syntax errors might be considered the most "catastrophic" and unrecoverable sort of XPointer error.

Resource Errors

Once an XPointer passes muster syntactically, it's still not out of the woods as a "valid" XPointer. At this point, considering the XPointer as a simple string of characters — which is, after all, what a processor does when it verifies an XPointer's syntax — gives way to determining its usefulness for XPointer's stated purpose: locating some portion(s) of an XML-based document. When the document in question — the resource itself — doesn't exist, the processor must report a resource error to the controlling application.

Subresource Errors

A subresource error, on the other hand, must be reported when a given XPointer identifies a specific subresource that doesn't exist — even though the document itself does. For example, consider this document:


A syntactically correct XPointer (one passing the syntax-error check) could point, correctly, to this document itself (and therefore pass the resource-error check); but if the XPath expression built into the XPointer referred to some descendant of the root element named, say, evil_world_dominaton_plan, the processor would report a subresource error. That is, the resource as a whole exists but not the particular portion of the resource identified by the XPointer.

Note in particular that while an empty node-set is perfectly legitimate in pure XPath terms — XPath allows for such a thing — an empty node-set is an XPointer error, called a "failure" by the XPointer Framework.

Encoding and Escaping Characters in XPointer

As you know, authoring any XML document requires care not to confuse the XML processor. Ampersands must be escaped using the &amp; entity reference, for instance, and greater-than symbols with &gt;. Documents containing XPointers have an additional burden imposed on them, even when the XPointers in question are used in some context other than an XML document: because an XPointer is meant to be included as a part of a URI, it must adhere to the special escaping rules that pertain to that Internet standard as well.

Characters Significant to XPointer Itself

A couple of characters have special significance to the XPointer standard itself. The first of these is the parenthesis, which in XPointer (as in other languages) is always expected to appear in left/right pairs. Under certain (and probably extremely rare) circumstances, you may need to use a parenthesis as a literal character in an XPointer, unbalanced by the opposite parenthesis. In this case, you escape the parenthesis by prefixing it with a circumflex character, ^. Thus, a literal left or right parenthesis would be legally represented by ^( or ^), respectively.

The second XPointer-significant character, therefore, is the circumflex itself. To escape a circumflex — that is, to cause the processor to recognize the character as a literal, not as an escaping character — prefix it with another circumflex. Thus, a literal circumflex can appear in an XPointer as ^^. All other uses of the circumflex character in an XPointer are illegal, resulting in a syntax error.

This is not to say, of course, that a literal circumflex may not be used elsewhere in an instance document. Remember that this constraint applies only to a circumflex within an XPointer.

URI-Significant Characters

It's common for an Internet address — a URI — to include somewhere within it certain special characters such as spaces, percent signs, and so on. Web browsers and other Internet applications may or may not choke on these special characters. But if they follow the URI standard, they're supposed to choke; this standard requires that such characters in a URI be escaped, using a percent sign and the Unicode value for the character.

For instance, consider the following URI:

http://my.dot.com/my documents/index.html

The space between my and documents is supposed to be escaped, using the hexadecimal Unicode value for the space character, so its correct form is:


Like other standards built on the Internet's basic linking infrastructure, XPointer follows the rules established for URIs — including, in this case, the rules for escaping special characters. These rules (as well as the others for URIs) are laid out in two IETF RFCs, numbers 2396 and 2372, which you can find at http://www.ietf.org/rfc/rfc2396.txt and http://www.ietf.org/rfc/rfc2732.txt, respectively.

URIs versus IURIs

The old XPointer 1.0 spec distinguished between plain-old garden-variety URIs and Internationalized URIs, (IURIs). An ordinary URI is limited to ASCII characters; an IURI may consist of any characters directly representable via Unicode. For instance, currently the official home page of the President of the French Republic (currently Jacques Chirac) is http://www.elysee.fr/pres/. All the characters in that URI are ASCII and so it's both a legal URI and a legal IURI. If we were using an IURI-aware application, though, it might be desirable to employ a more correctly French version of the URI: http://www.elysee.fr/prés/. The presence of the é character here is what makes the address both an illegal URI and a legal IURI.

In an IURI, the only difficulty occurs when the address itself includes a literal percent sign. In this case, escape the percent (%) with %25. (25 is the hexadecimal Unicode value for the percent sign.)

The current XPointer Framework doesn't mention IURIs, but this issue is likely to reappear.

Characters in XML Documents

Finally, of course, XPointer — being often used in XML documents — is no exception to the general rules of XML well-formedness. If your XPointer appears in an XML document and includes special XML-significant literal characters, such as greater-than signs or ampersands, you must escape them with standard XML entity references (such as &gt; and &amp;, respectively). Also remember that you must escape, with character entity references, any character(s) in the XPointer that can't be expressed in your instance document's own encoding; if you're working in a U.S.-ASCII-encoded document, therefore, you have to escape additional characters in the ISO-8859 encoding. In this scenario, for example, you'd have to escape Á (capital A with an acute accent) as either &#xc1; (hexadecimal form) or &#193; (decimal form).

Progressive Escaping

XPointer specifications often use syntax that cannot be used directly in a URL. To implement them, you must use a prodecure called progressive escaping. This term is defined nowhere in the spec; but from the context, it clearly refers to how you, an XPointer-using document author, might apply successive levels of the above escaping types to a XPointer.

  1. Start with the completely unescaped form. You might consider this to be the plain-text, common-sense version.
  2. Escape any XPointer-significant characters meant to be interpreted as literals — that is, any parentheses or circumflexes.
  3. Escape any percent signs in the URI portion of the XPointer, to ensure a legal IURI. (This may not be necessary if XPointer specifications continue to ignore IURIs.)
  4. If the XPointer appears in an XML document, escape any XML-significant characters meant to be interpreted as literals, such as ampersands, double quotation marks, and so on.
  5. Escape the entity-reference XML escaping forms as URI escaping forms, for example, replace &quot; with %22.

To repeat, this progressive-escaping mechanism does not suggest that this stack of escaping sieves is to be implemented in an XPointer processor: it's meant to make an XPointer acceptable to such a processor in the first place. On the processing side, each successive application layer — XML, the HTTP-aware application, and XPointer itself — unescapes the XPointer in reverse order, from the outside in as it were.

Progressive escaping: a (perverse) example

Showing you an example of progressive escaping in practice is, to put it mildly, a challenge at this point. In the first place, you don't yet know enough (if anything) about XPointer syntax. Second, you seldom will have to pass even a full-blown URI, including an XPointer, through all five steps.

Still, you do already know something about XPointers: they can include XPath expressions. So let's imagine you're working on some kind of text-processing application and need to construct an XPointer to be used in an XML document; the XPointer includes the following (unescaped) XPath expression:

translate(., "(  )%&^", "[]@v")

This function call scans the string-value of the context node (that's the . used as the first argument) for any of the characters in the second argument, if it finds any of those characters, the function replaces it with the corresponding character in the third argument. So a left parenthesis is replaced with a left square bracket, a percent sign with @, and so on.

In this form, the XPath expression has been subject only to step 1 of all those required above: it's in its "natural language" form (well, as such things go). Now let's walk through the expression and fix it up according to the remaining steps in progressive escaping, so it can be used in an XPointer included in a URI.

  1. Escaping XPointer-significant characters. There are two parentheses and a single circumflex in the XPath expression. To avoid confusing the XPointer processor downstream somewhere, you have to escape these XPointer-significant characters. Do this by preceding them with circumflexes:
    translate(., "^(^)%&^^", "[]@v")

    Note, by the way, that the parentheses enclosing the translate( ) function's arguments do not require escaping, since these are used in a "natural" way.

  2. Escaping IURI-forbidden characters. Our XPath expression has only one percent sign in it, but that's sufficient to choke a processor expecting a valid Internationalized URI. To be sure the processor treats the literal percent sign as such, it must be escaped using an entity reference:
    translate(., "^(^)%25&^^", "[]@v")
  3. Escaping XML-significant characters. The ampersand in the first argument and the quotation marks enclosing the function's arguments are the only characters requiring escaping for XML's purposes. Thus:
    translate(., &quot;^(^)%25&amp;^^&quot;, &quot;[]@v&quot;)
  4. Escaping URI-forbidden characters. Finally, all spaces and any remaining special characters need to be "entitized" in the URI to be acceptable to a URI-aware processor. The expression to this point includes two spaces and the &quot; and &amp; entity references, all of which must be escaped. Furthermore, the expression includes a number of other special characters that are illegal in URIs in their unescaped form: the very circumflexes we so carefully escaped, in step 3. Thus, after passing it through this last step in progressive escaping, we have our final, fully escaped (although, granted, completely incomprehensible) expression:

This is nearly as painful to look at as it was to construct. Just remember that when and if the XPointer specifications make it to full Recommendation status, the XPointer-constructing toolkit (source code editors, automatic code generators, etc.) will certainly obviate the need to build these monstrosities by hand.

(One final caveat: while this example has shown progressive escaping applied to the XPointer portion of a URI, remember that steps 2 through 5 must be taken for special characters in the rest of the URI as well.)


The notion of progressive escaping was covered at some length in the previous version of the XPointer spec but removed in the versions published July 10, 2002. Nonetheless, its an important concept, and one that may find its way back into the final XPointer Recommendations.


  1. An "external parsed entity," unlike an XML document, need not be well-formed. In particular, it may lack a single root element. These quasi-documents are often used for such purposes as holding boilerplate text. They're meaningful — and parseable — only when included (using external entity references) within containing documents.
Personal tools