Revision 110821 of Introduction to using XPath in JavaScript

  • Revision slug: Introduction_to_using_XPath_in_JavaScript
  • Revision title: Introduction to using XPath in JavaScript
  • Revision id: 110821
  • Created:
  • Creator: Jt
  • Is current revision? No
  • Comment

Revision Content

Introduction

This document describes the interface to access XPath functions using JavaScript.

Mozilla implements much of the DOM 3 XPath. This allows XPath expressions to be run against both HTML and XML documents.

The simplest interface to XPath is the evaluate function of the document object, which returns an object of type XPathResult:

var xpathResult = document.evaluate(xpathExpression, contextNode, namespaceResolver, resultType, result);

The evaluate function takes a total of five arguments:

  • xpathExpression: A string containing the XPath expression to be evaluated.
  • contextNode: A node in the document against which the XPath expression should be evaluated. The document node is the most commonly used.
  • namespaceResolver: A function that will be passed any namespace prefixes from xpathExpression and return a string representing the namespace URI associated with that prefix. This enables conversion between the prefixes used in the XPath expressions and the (possibly different) prefixes used in the document. The most commonly used value for this is null, which is used for HTML documents or when no namespace prefixes are used.
  • resultType: A numeric constant (or named constant properties, such as XPathResult.ANY_TYPE) that corresponds to the result of type XPathResult to return. (defined in the relevaant section of the XPath Spec). The most commonly passed constant is XPathResult.ANY_TYPE which will return the results of the XPath expression as the most natural type.
  • result: An existing XPathResult to use for the results. Passing null causes a new XPathResult to be created.


A Simple Example

To extract the level 2 headings of a HTML document using XPath the expression is simply '//h2'. The full code for this is then:

var headings = document.evaluate('//h2', document, null, XPathResult.ANY_TYPE, null);

Notice that, since HTML does not have namespaces, we have passed null for the namespaceResolver. Since we wish to search over the entire document for the headings, we have used the document object itself as the contextNode.

The result of this expression is an XPathResult<code> object. If we wish to know the type of result returned, we may evaluate the <code>resultType property of the returned object. In this case that will evaluate to 4, which, as per the ECMAScript language binding for XPath represents a UNORDERED_NODE_ITERATOR_TYPE. This is the default return type when the result of the XPath expression is a node set. It allows us access to a single node at a time and does not make any promises about the order in which the nodes will be returned. To access the returned nodes, we may use the iterateNext method of the returned object:

var thisHeading = headings.iterateNext();
var alertText = "Level 2 headings in this document are:\n"
while (thisHeading) {
  alertText += thisHeading.textContent + "\n"
  thisHeading = headings.iterateNext();
}

Once we iterate to a node, we have access to all the standard Mozilla-supported DOM interfaces on that node. After iterating through all the h2 elements returned from our expression, any further calls to iterateNext() will return null.

Returning results other than node sets

In many situations, the return value of an XPath expression is not a node set but a simpler type: a number, a string or a boolean value. For expressions of this type, we still obtain an XPathResult object as a result of our call to document.evaluate but we must access the numberValue, stringValue or booleanValue properties of the XPathResult to retrive our results.

A simple example is using the XPath expression count(//p) to obtain the number of paragraphs in a HTML document:

var paragraphCount = document.evaluate("count(//p)", document, null, XPathResult.ANY_TYPE,null).numberValue;
alert("This document contains " + paragraphCount + " paragraphs");

Although Javascript allows us to convert the number to a string for display, the XPath interface will not automatically convert the numerical result if the stringValue property is requested so the following code will not work:

var paragraphCount = document.evaluate("count(//p)", document, null, XPathResult.ANY_TYPE,null).stringValue;
alert("This document contains " + paragraphCount + " paragraphs");

Instead it will return NS_DOM_TYPE_ERROR. It is possible to request a specific type of return value by altering the resultType property. In order to force a string return type, we can pass the constant XPathResult.STRING_TYPE, so the following code will work:

var paragraphCount = document.evaluate("count(//p)", document, null, XPathResult.STRING_TYPE, null).stringValue;
alert("This document contains " + paragraphCount + " paragraphs");
}

The constants corresponding to the other simple types follow the same naming pattern:

  • XPathResult.STRING_TYPE: Return the result as a string where possible
  • XPathResult.NUMBER_TYPE: Return the result as a floating point number where possible
  • XPathResult.BOOLEAN_TYPE: Return the result as a boolean where possible

Other types of nodesets

the earlier example of reading all the level 2 headings in a document, the nodeset returned was of type UNORDERED_NODE_ITERATOR_TYPE. The XPath interface allows nodesets to be returned in a variety of different ways. There are three principal caategories of nodesets that can be returned:

  • Iterators: An iterator which allows access to one node at at time. The next node can be accessed with the iterateNext() method of the XpathResult() object. If the document is modified between the XPath evaluation, the invalidIteratorState property becomes true
  • Snapshots: A static list of nodes that match the XPathExpression, accessed through the snapshotItem(itemNumber) method of the XPathResult, where itemNumber is the index of the node to be retrived. The number of nodes returned can be accessed through the snapshotLength property. Snapshots do not change with document mutations and so may contain nodes that no longer exist or an incomplete set of results.
  • First nodes: Only the first found node matching the XPath expression is returned. This may be accessed through the singleNodeValue property of the XPathResult object.

For each of these nodeset types, there is a subtype that is guaranteed to retain document order and one that is not. The full list of typecodes that can be passed as a resultType argument is:

  • XPathResult.UNORDERED_NODE_ITERATOR_TYPE: Iterator, Unordered
  • XPathResult.ORDERED_NODE_ITERATOR_TYPE: Iterator, Ordered
  • XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE: Snapshot, Unordered
  • XPathResult.ORDERED_NODE_SNAPSHOT_TYPE: Snapshot, Ordered
  • XPathResult.ANY_UNORDERED_NODE_TYPE: Single node, Unordered
  • XPathResult.FIRST_ORDERED_NODE_TYPE: Single node, Ordered

For example, //XXX - want an example with add and remove nodes and probably also with contextNode != document

Using XPath with XML

All the examples so far have been designed to work with HTML documents in which there are no namespaces. In order to use XPath on XML documents, one must provide a function for converting namespace prefixes in the xpath expression to those in the document. Such a function can be provided explicitly or can be created from a node in the target document.

Implementing a Namespace Resolver

Namespace resolvers are simply functions that take namespace prefixes from the XPath expression and return the corresponding URI. For example, the expression:

//html:td/mathml:math

Might be designed to select all MathML expressions that are the children of HTML table elements. To assosiate the mathml: prefix with the namespace URL http://www.w3.org/1998/Math/MathML and html: with the URL http://www.w3.org/1999/xhtml we can provide a function:

function NSResolver(prefix) {
  if(prefix == 'html') {
    return 'http://www.w3.org/1999/xhtml';
  }
  else if(prefix == 'mathml') {
    return 'http://www.w3.org/1998/Math/MathML'
  }
  else  {
  //this shouldn't ever happen
    return null;
  }
}

Our call to document.evaluate then looks like:

document.evaluate("//html:td/mathml:math", document, NSResolver, XPathResult.ANY_TYPE, null);

Result Types

Result Type Value Description
ANY_TYPE 0 Whatever type naturally results from the given expression.
NUMBER_TYPE 1 A result set containing a single number. Useful, for example, in an XPath expression using the count() function.
STRING_TYPE 2 A result set containing a single string.
BOOLEAN_TYPE 3 A result set containing a single boolean value. Useful, for example, an an XPath expression using the not() function.
UNORDERED_NODE_ITERATOR_TYPE 4 A result set containing all the nodes matching the expression. The nodes in the result set are not necessarily in the same order they appear in the document.
ORDERED_NODE_ITERATOR_TYPE 5 A result set containing all the nodes matching the expression. The nodes in the result set are in the same order they appear in the document.
UNORDERED_NODE_SNAPSHOT_TYPE 6 A result set containing snapshots of all the nodes matching the expression. The nodes in the result set are not necessarily in the same order they appear in the document.
ORDERED_NODE_SNAPSHOT_TYPE 7 A result set containing snapshots of all the nodes matching the expression. The nodes in the result set are in the same order they appear in the document.
ANY_UNORDERED_NODE_TYPE 8 A result set containing any single node that matches the expression. The node is not necessarily the first node in the document that matches the expression.
FIRST_ORDERED_NODE_TYPE 9 A result set containing the first node in the document that matches the expression.

Results of NODE_ITERATOR types contain references to nodes in the document. Modifying a node will invalidate the iterator. After modifying a node, attempting to iterate through the results will result in an error.

Results of NODE_SNAPSHOT types contain snapshots, or copies of nodes in the document. These nodes can be modified, but modifying them does not modify the document.

Revision Source

<h3 name="Introduction">Introduction</h3>
<p>This document describes the interface to access <a href="en/XPath/Functions"> XPath functions</a> using JavaScript.
</p><p>Mozilla implements much of the <a class="external" href="http://www.w3.org/TR/DOM-Level-3-XPath/xpath.html">DOM 3 XPath</a>. This allows XPath expressions to be run against both HTML and XML documents.
</p><p>The simplest interface to XPath is the <a href="en/DOM/document.evaluate">evaluate</a> function of the <a href="en/DOM/document">document</a> object, which returns an object of type <a href="en/XPathResult">XPathResult</a>:
</p>
<pre>var xpathResult = document.evaluate(xpathExpression, contextNode, namespaceResolver, resultType, result);
</pre>
<p>The evaluate function takes a total of five arguments:
</p>
<ul><li><code>xpathExpression</code>: A string containing the XPath expression to be evaluated.
</li></ul>
<ul><li><code>contextNode</code>: A node in the document against which the XPath expression should be evaluated. The <a href="en/DOM/document">document</a> node is the most commonly used.
</li></ul>
<ul><li><code>namespaceResolver</code>: A function that will be passed any namespace prefixes from <code>xpathExpression</code> and return a string representing the namespace URI associated with that prefix. This enables conversion between the prefixes used in the XPath expressions and the (possibly different) prefixes used in the document. The most commonly used value for this is <code>null</code>, which is used for HTML documents or when no namespace prefixes are used.
</li></ul>
<ul><li><code>resultType</code>: A numeric constant (or named constant properties, such as <code>XPathResult.ANY_TYPE</code>) that corresponds to the result of type <code>XPathResult</code> to return. (defined in the relevaant section of the XPath Spec). The most commonly passed constant is <code>XPathResult.ANY_TYPE</code> which will return the results of the XPath expression as the most natural type.
</li></ul>
<ul><li> <code>result</code>: An existing <code>XPathResult</code> to use for the results. Passing <code>null</code> causes a new <code>XPathResult</code> to be created.
</li></ul>
<p><br>
</p>
<h3 name="A_Simple_Example">A Simple Example</h3>
<p>To extract the level 2 headings of a HTML document using XPath the expression is simply <code>'//h2'</code>. The full code for this is then:
</p>
<pre>var headings = document.evaluate('//h2', document, null, XPathResult.ANY_TYPE, null);</pre>
<p>Notice that, since HTML does not have namespaces, we have passed <code>null</code> for the <code>namespaceResolver</code>. Since we wish to search over the entire document for the headings, we have used the <a href="en/DOM/document">document</a> object itself as the <code>contextNode</code>.
</p><p>The result of this expression is an <code>XPathResult&lt;code&gt; object. If we wish to know the type of result returned, we may evaluate the &lt;code&gt;resultType</code> property of the returned object. In this case that will evaluate to 4, which, as per the ECMAScript language binding for XPath represents a <code>UNORDERED_NODE_ITERATOR_TYPE</code>. This is the default return type when the result of the XPath expression is a node set. It allows us access to a single node at a time and does not make any promises about the order in which the nodes will be returned. To access the returned nodes, we may use the iterateNext method of the returned object:
</p>
<pre>var thisHeading = headings.iterateNext();
var alertText = "Level 2 headings in this document are:\n"
while (thisHeading) {
  alertText += thisHeading.textContent + "\n"
  thisHeading = headings.iterateNext();
}
</pre>
<p>Once we iterate to a node, we have access to all the standard Mozilla-supported DOM interfaces on that node. After iterating through all the h2 elements returned from our expression, any further calls to iterateNext() will return null.
</p>
<h3 name="Returning_results_other_than_node_sets">Returning results other than node sets</h3>
<p>In many situations, the return value of an XPath expression is not a node set but a simpler type: a number, a string or a boolean value. For expressions of this type, we still obtain an XPathResult object as a result of our call to document.evaluate but we must access the numberValue, stringValue or booleanValue properties of the XPathResult to retrive our results.
</p><p>A simple example is using the XPath expression count(//p) to obtain the number of paragraphs in a HTML document:
</p>
<pre>var paragraphCount = document.evaluate("count(//p)", document, null, XPathResult.ANY_TYPE,null).numberValue;
alert("This document contains " + paragraphCount + " paragraphs");
</pre>
<p>Although Javascript allows us to convert the number to a string for display, the XPath interface will not automatically convert the numerical result if the stringValue property is requested so the following code will not work:
</p>
<pre>var paragraphCount = document.evaluate("count(//p)", document, null, XPathResult.ANY_TYPE,null).stringValue;
alert("This document contains " + paragraphCount + " paragraphs");
</pre>
<p>Instead it will return NS_DOM_TYPE_ERROR. It is possible to request a specific type of return value by altering the resultType property. In order to force a string return type, we can pass the constant XPathResult.STRING_TYPE, so the following code will work:
</p>
<pre>var paragraphCount = document.evaluate("count(//p)", document, null, XPathResult.STRING_TYPE, null).stringValue;
alert("This document contains " + paragraphCount + " paragraphs");
}
</pre>
<p>The constants corresponding to the other simple types follow the same naming pattern:
</p>
<ul><li> XPathResult.STRING_TYPE: Return the result as a string where possible
</li><li> XPathResult.NUMBER_TYPE: Return the result as a floating point number where possible
</li><li> XPathResult.BOOLEAN_TYPE: Return the result as a boolean where possible
</li></ul>
<h3 name="Other_types_of_nodesets">Other types of nodesets</h3>
<p>the earlier example of reading all the level 2 headings in a document, the nodeset returned was of type UNORDERED_NODE_ITERATOR_TYPE. The XPath interface allows nodesets to be returned in a variety of different ways. There are three principal caategories of nodesets that can be returned:
</p>
<ul><li>Iterators: An iterator which allows access to one node at at time. The next node can be accessed with the iterateNext() method of the XpathResult() object. If the document is modified between the XPath evaluation, the invalidIteratorState property becomes true
</li></ul>
<ul><li>Snapshots: A static list of nodes that match the XPathExpression, accessed through the snapshotItem(itemNumber) method of the XPathResult, where itemNumber is the index of the node to be retrived. The number of nodes returned can be accessed through the snapshotLength property. Snapshots do not change with document mutations and so may contain nodes that no longer exist or an incomplete set of results.
</li><li>First nodes: Only the first found node matching the XPath expression is returned. This may be accessed through the singleNodeValue property of the XPathResult object.
</li></ul>
<p>For each of these nodeset types, there is a subtype that is guaranteed to retain document order and one that is not. The full list of typecodes that can be passed as a resultType argument is:
</p>
<ul><li>XPathResult.UNORDERED_NODE_ITERATOR_TYPE: Iterator, Unordered
</li><li>XPathResult.ORDERED_NODE_ITERATOR_TYPE: Iterator, Ordered
</li><li>XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE: Snapshot, Unordered
</li><li>XPathResult.ORDERED_NODE_SNAPSHOT_TYPE: Snapshot, Ordered
</li><li>XPathResult.ANY_UNORDERED_NODE_TYPE: Single node, Unordered
</li><li>XPathResult.FIRST_ORDERED_NODE_TYPE: Single node, Ordered
</li></ul>
<p>For example, //XXX - want an example with add and remove nodes and probably also with contextNode != document
</p>
<h3 name="Using_XPath_with_XML">Using XPath with XML</h3>
<p>All the examples so far have been designed to work with HTML documents in which there are no namespaces. In order to use XPath on XML documents, one must provide a function for converting namespace prefixes in the xpath expression to those in the document. Such a function can be provided explicitly or can be created from a node in the target document.
</p>
<h4 name="Implementing_a_Namespace_Resolver">Implementing a Namespace Resolver</h4>
<p>Namespace resolvers are simply functions that take namespace prefixes from the XPath expression and return the corresponding URI. For example, the expression:
</p><p>//html:td/mathml:math
</p><p>Might be designed to select all MathML expressions that are the children of HTML table elements. To assosiate the mathml: prefix with the namespace URL http://www.w3.org/1998/Math/MathML and html: with the URL http://www.w3.org/1999/xhtml we can provide a function:
</p>
<pre>function NSResolver(prefix) {
  if(prefix == 'html') {
    return 'http://www.w3.org/1999/xhtml';
  }
  else if(prefix == 'mathml') {
    return 'http://www.w3.org/1998/Math/MathML'
  }
  else  {
  //this shouldn't ever happen
    return null;
  }
}
</pre>
<p>Our call to document.evaluate then looks like:
</p>
<pre>document.evaluate("//html:td/mathml:math", document, NSResolver, XPathResult.ANY_TYPE, null);
</pre>
<h3 name="Result_Types">Result Types</h3>
<table class="standard-table">

<tbody><tr>
<td class="header">Result Type
</td><td class="header">Value
</td><td class="header">Description
</td></tr>

<tr>
<td>ANY_TYPE
</td><td>0
</td><td>Whatever type naturally results from the given expression.
</td></tr>

<tr>
<td>NUMBER_TYPE
</td><td>1
</td><td>A result set containing a single number.  Useful, for example, in an XPath expression using the <code>count()</code> function.
</td></tr>

<tr>
<td>STRING_TYPE
</td><td>2
</td><td>A result set containing a single string.
</td></tr>

<tr>
<td>BOOLEAN_TYPE
</td><td>3
</td><td>A result set containing a single boolean value. Useful, for example, an an XPath expression using the <code>not()</code> function.
</td></tr>

<tr>
<td>UNORDERED_NODE_ITERATOR_TYPE
</td><td>4
</td><td>A result set containing all the nodes matching the expression.  The nodes in the result set are not necessarily in the same order they appear in the document.
</td></tr>

<tr>
<td>ORDERED_NODE_ITERATOR_TYPE
</td><td>5
</td><td>A result set containing all the nodes matching the expression.  The nodes in the result set are in the same order they appear in the document.
</td></tr>

<tr>
<td>UNORDERED_NODE_SNAPSHOT_TYPE
</td><td>6
</td><td>A result set containing snapshots of all the nodes matching the expression. The nodes in the result set are not necessarily in the same order they appear in the document.
</td></tr>

<tr>
<td>ORDERED_NODE_SNAPSHOT_TYPE
</td><td>7
</td><td>A result set containing snapshots of all the nodes matching the expression. The nodes in the result set are in the same order they appear in the document.
</td></tr>

<tr>
<td>ANY_UNORDERED_NODE_TYPE
</td><td>8
</td><td>A result set containing any single node that matches the expression. The node is not necessarily the first node in the document that matches the expression.
</td></tr>

<tr>
<td>FIRST_ORDERED_NODE_TYPE
</td><td>9
</td><td>A result set containing the first node in the document that matches the expression.
</td></tr>
</tbody></table>
<p>Results of NODE_ITERATOR types contain references to nodes in the document.  Modifying a node will invalidate the iterator.  After modifying a node, attempting to iterate through the results will result in an error.
</p><p>Results of NODE_SNAPSHOT types contain snapshots, or <i>copies</i> of nodes in the document.  These nodes can be modified, but modifying them does not modify the document.
</p>
Revert to this revision