Introduction to using XPath in JavaScript

  • Revision slug: Introduction_to_using_XPath_in_JavaScript
  • Revision title: Introduction to using XPath in JavaScript
  • Revision id: 110850
  • Created:
  • Creator: Jt
  • Is current revision? No
  • Comment /* Code */

Revision Content

This document describes the interface to access XPath functions using JavaScript.

Mozilla implements much of the DOM 3 XPath. This allows XPath expressions to be run against both HTML and XML documents.

Code

The simplest interface to XPath is the evaluate function of the document object, which returns an object of type XPathResult:

The evaluate Function of the document Object

var xpathResult = document.evaluate( xpathExpression, contextNode, namespaceResolver, resultType, result );

The evaluate function takes a total of five arguments:

  • xpathExpression: A string containing the XPath expression to be evaluated.
  • contextNode: A node in the document against which the xpathExpression should be evaluated. The document node is the most commonly used.
  • namespaceResolver: A function that will be passed any namespace prefixes contained within xpathExpression which returns a string representing the namespace URI associated with that prefix. This enables conversion between the prefixes used in the XPath expressions and the possibly different prefixes used in the document. The function can be either:
    • null, which is used for HTML documents or when no namespace prefixes are used. Note that if the xpathExpression contains a namespace prefix this will result in a DOMException being thrown with the code NAMESPACE_ERR.
  • resultType: A constant that defines the desired result type to be returned as a result of the evaluation. The most commonly passed constant is XPathResult.ANY_TYPE which will return the results of the XPath expression as the most natural type.
  • result: Either an existing XPathResult which is to be reused to return the results, or null to create a new XPathResult.

Implementing a Namespace Resolver

We will create a namespace resolver using the createNSResolver method of a XPathEvaluator] object.

  var xpEvaluator = new XPathEvaluator();

  var nsResolver = xpEvaluator.createNSResolver( aNode.ownerDocument == null ? aNode.documentElement : aNode.ownerDocument.documentElement );

And then pass document.evaluate, nsResolver as the namespaceResolver argument.

A Simple Example with a HTML Document

To extract all the <h2> heading elements in a HTML document using XPath, the xpathExpression is simply '//h2'. Where, // is the Recursive Descent Operator that matches elements with the nodeName h2 anywhere in the document tree. The full code for this is:

var headings = document.evaluate('//h2', document, null, XPathResult.ANY_TYPE, null);

Notice that, since HTML does not have namespaces, we have passed null for the namespaceResolver argument.

Since we wish to search over the entire document for the headings, we have used the document object itself as the contextNode.

The result of this expression is an XPathResult object. If we wish to know the type of result returned, we may evaluate the resultType property of the returned object. In this case that will evaluate to 4, a UNORDERED_NODE_ITERATOR_TYPE. This is the default return type when the result of the XPath expression is a node-set. It provides access to a single node at a time and may not return nodes in a particular order. To access the returned nodes, we use the iterateNext() method of the returned object:

var thisHeading = headings.iterateNext();
var alertText = 'Level 2 headings in this document are:\n'
while (thisHeading) {
  alertText += thisHeading.textContent + '\n';
  thisHeading = headings.iterateNext();
}

Once we iterate to a node, we have access to all the standard DOM interfaces on that node. After iterating through all the h2 elements returned from our expression, any further calls to iterateNext() will return null.


Return Types

The return value of an XPath expression can either be individual nodes (#Simple Types simple types), or a collection of nodes (#Node-Set Types node-set types)

Simple Types

is not a node set but a simpler type: a floating point number - NUMBER_TYPE, a string - STRING_TYPE, or a boolean value - BOOLEAN_TYPE. For expressions of this type, we still obtain an XPathResult object as a result of our call to document.evaluate() but we must access the numberValue, stringValue or booleanValue properties respectively of the XPathResult object to retrieve our results.

A simple example is using the XPath expression count(//p) to obtain the number of <p> elements in a HTML document:

var paragraphCount = document.evaluate('count(//p)', document, null, XPathResult.ANY_TYPE, null).numberValue;
alert('This document contains ' + paragraphCount + ' paragraph elements');

Although Javascript allows us to convert the number to a string for display, the XPath interface will not automatically convert the numerical result if the stringValue property is requested, so the following code will '<big>not</big>' work:

var paragraphCount = document.evaluate('count(//p)', document, null, XPathResult.ANY_TYPE, null).stringValue;
alert('This document contains ' + paragraphCount + ' paragraph elements"');

Instead it will return an exception with the code NS_DOM_TYPE_ERROR. It is possible to request a specific type of return value by altering the resultType property. In order to force a string return type, we can pass resultType the constant XPathResult.STRING_TYPE, so the following code will work:

var paragraphCount = document.evaluate('count(//p)', document, null, XPathResult.STRING_TYPE, null).stringValue;
alert('This document contains ' + paragraphCount + ' paragraph elements');

Node-Set Types

In the earlier example of matching all the h2 headings in a document, the node-set returned was of type UNORDERED_NODE_ITERATOR_TYPE. The XPath interface allows node-sets to be returned in a variety of different ways, there are 3 principal categories of node set types that can be returned:

  • Iterators: An iterator which allows access to the matched nodes one at at time. The next node can be accessed with the iterateNext() method of the XpathResult object. If the document is modified between iterations it invalidates the iteration and the invalidIteratorState property becomes is set to true.
  • Snapshots: A static list of matched nodes, which are accessed through the snapshotItem(itemNumber) method of XPathResult, where itemNumber is the index of the node to be retrieved. The number of nodes returned can be accessed through the snapshotLength property. Snapshots do not change with document mutations, so unlike the iterator result the snapshot does not become invalid but it may not correspond to the current document, the nodes may have been moved, it might contain nodes that no longer exist, or an incomplete set of node results.
  • First Nodes: Only the first found node matching the XPath expression is returned. This may be accessed through the singleNodeValue property of the XPathResult object. This will be null if the node set is empty. For the unordered subtype the single node returned might not be the first in document order.

For each of these node set types, there are 2 subtypes:

  • Ordered: Guaranteed to retain the nodes in the document order.
  • Unordered: May not produce nodes in a particular order.

===Node-set type Example=== //XXX - want an example with add and remove nodes and probably also with contextNode != document <pre> </pre> ==Using XPath with XML documents== The examples so far have been designed to work with HTML documents in which there are no namespaces. In order to use XPath on {{mediawiki.internal('XML documents', "en")}}, we must provide a namespace resolver function for converting namespace prefixes in the document.

Appendix

Using a User Defined Namespace Resolver

This function will need to take namespace prefixes from the xpathExpression and return the URI that corresponds to that prefix. For example, the expression:

//xhtml:td/mathml:math

will select all MathML expressions that are the children of (X)HTML table data cell elements.

To associate the mathml: prefix with the namespace URI 'http://www.w3.org/1998/Math/MathML' and html: with the URI http://www.w3.org/1999/xhtml we provide a function:

function NSResolver( prefix ) 
{
  if ( prefix == 'xhtml' ) 
  {
    return 'http://www.w3.org/1999/xhtml';
  }
  else if ( prefix == 'mathml' ) 
  {
    return 'http://www.w3.org/1998/Math/MathML'
  }
  else
  {
    return null;
  }
}

Our call to document.evaluate would then looks like:

document.evaluate('//xhtml:td/mathml:math', document, NSResolver, XPathResult.ANY_TYPE, null);

XPathResult Object Defined Constants

Result Type Defined Constant Value Description
ANY_TYPE 0 A result set containing whatever type naturally results from evaluation of the expression. Note that if the result is a node-set then UNORDERED_NODE_ITERATOR_TYPE is always the resulting type.
NUMBER_TYPE 1 A result containing a single number. This is useful for example, in an XPath expression using the count() function.
STRING_TYPE 2 A result containing a single string.
BOOLEAN_TYPE 3 A result containing a single boolean value. This is useful for example, in an XPath expression using the not() function.
UNORDERED_NODE_ITERATOR_TYPE 4 A result node-set containing all the nodes matching the expression. The nodes may not necessarily be in the same order that they appear in the document.
ORDERED_NODE_ITERATOR_TYPE 5 A result node-set containing all the nodes matching the expression. The nodes in the result set are in the same order that they appear in the document.
UNORDERED_NODE_SNAPSHOT_TYPE 6 A result node-set containing snapshots of all the nodes matching the expression. The nodes may not necessarily be in the same order that they appear in the document.
ORDERED_NODE_SNAPSHOT_TYPE 7 A result node-set containing snapshots of all the nodes matching the expression. The nodes in the result set are in the same order that they appear in the document.
ANY_UNORDERED_NODE_TYPE 8 A result node-set containing any single node that matches the expression. The node is not necessarily the first node in the document that matches the expression.
FIRST_ORDERED_NODE_TYPE 9 A result node-set containing the first node in the document that matches the expression.

Original Document Information

  • Author(s): James Graham
  • Other Contributors: James Thompson
  • Last Updated Date: 2006-3-18
  • Migrated from Mozilla XPath Tutorial

Revision Source

<p>
</p><p>This document describes the interface to access <a href="en/XPath/Functions"> XPath functions</a> using JavaScript.
</p><p>Mozilla implements much of the <a class="external" href="http://www.w3.org/TR/DOM-Level-3-XPath/xpath.html">DOM 3 XPath</a>. This allows XPath expressions to be run against both HTML and XML documents.
</p>
<h3 name="Code">Code</h3>
<p>The simplest interface to XPath is the <a href="en/DOM/document.evaluate">evaluate</a> function of the <a href="en/DOM/document">document</a> object, which returns an object of type <code><a href="en/XPathResult">XPathResult</a></code>:
</p>
<h4 name="The_evaluate_Function_of_the_document_Object">The evaluate Function of the document Object</h4>
<pre>var xpathResult = document.evaluate( xpathExpression, contextNode, namespaceResolver, resultType, result );
</pre>
<p>The <a href="en/DOM/document.evaluate">evaluate</a> function takes a total of five arguments:
</p>
<ul><li><code>xpathExpression</code>: A string containing the XPath expression to be evaluated.
</li></ul>
<ul><li><code>contextNode</code>: A node in the document against which the <code>xpathExpression</code> should be evaluated. The <a href="en/DOM/document">document</a> node is the most commonly used.
</li></ul>
<ul><li><code>namespaceResolver</code>: A function that will be passed any namespace prefixes contained within <code>xpathExpression</code> which returns a string representing the namespace URI associated with that prefix. This enables conversion between the prefixes used in the XPath expressions and the possibly different prefixes used in the document. The function can be either:
</li></ul>
<ul><li><ul><li><a href="#Using_createNSResolver_Method_of_a_XPathEvaluator_Object">Created</a> by using the <code><a class="external" href="http://www.xulplanet.com/references/objref/XPathEvaluator.html#method_createNSResolver">createNSResolver</a></code> method of a <code><a class="external" href="http://www.xulplanet.com/references/objref/XPathEvaluator.html">XPathEvaluator</a></code> object. Use this if you are not sure what you need to use.
</li></ul>
</li></ul>
<ul><li><ul><li><code>null</code>, which is used for HTML documents or when no namespace prefixes are used. Note that if the <code>xpathExpression</code> contains a namespace prefix this will result in a <code>DOMException</code> being thrown with the code <code>NAMESPACE_ERR</code>.
</li></ul>
</li></ul>
<ul><li><ul><li>A custom <a href="#Using_a_User_Defined_Namespace_Resolver">user-defined function</a>.
</li></ul>
</li></ul>
<ul><li><code>resultType</code>: A <a href="#XPathResult_Object_Defined_Constants">constant</a> that defines the desired result type to be returned as a result of the evaluation. The most commonly passed constant is <code>XPathResult.ANY_TYPE</code> which will return the results of the XPath expression as the most natural type.
</li></ul>
<ul><li><code>result</code>: Either an existing <code>XPathResult</code> which is to be reused to return the results, or <code>null</code> to create a new <code>XPathResult</code>.
</li></ul>
<h4 name="Implementing_a_Namespace_Resolver">Implementing a Namespace Resolver</h4>
<p>We will create a namespace resolver using the <code>createNSResolver</code> method of a <code>XPathEvaluator]</code> object.
</p>
<pre>  var xpEvaluator = new XPathEvaluator();

  var nsResolver = xpEvaluator.createNSResolver( aNode.ownerDocument == null ? aNode.documentElement : aNode.ownerDocument.documentElement );
</pre>
<p>And then pass <code>document.evaluate</code>, <code>nsResolver</code> as the <code>namespaceResolver</code> argument.
</p>
<h3 name="A_Simple_Example_with_a_HTML_Document">A Simple Example with a HTML Document</h3>
<p>To extract all the <span class="plain">&lt;h2&gt;</span> heading elements in a HTML document using XPath, the <code>xpathExpression</code> is simply '<code>//h2</code>'. Where, <code>//</code> is the Recursive Descent Operator that matches elements with the nodeName <code>h2</code> anywhere in the document tree. The full code for this is:
</p>
<pre>var headings = document.evaluate('//h2', document, null, XPathResult.ANY_TYPE, null);</pre>
<p>Notice that, since HTML does not have namespaces, we have passed <code>null</code> for the <code>namespaceResolver</code> argument. 
</p><p>Since we wish to search over the entire document for the headings, we have used the <a href="en/DOM/document">document</a> object itself as the <code>contextNode</code>.
</p><p>The result of this expression is an <code>XPathResult</code> object. If we wish to know the type of result returned, we may evaluate the <code>resultType</code> property of the returned object. In this case that will evaluate to 4, a <code>UNORDERED_NODE_ITERATOR_TYPE</code>. This is the default return type when the result of the XPath expression is a node-set. It provides access to a single node at a time and may not return nodes in a particular order. To access the returned nodes, we use the <code>iterateNext()</code> method of the returned object:
</p>
<pre>var thisHeading = headings.iterateNext();
var alertText = 'Level 2 headings in this document are:\n'
while (thisHeading) {
  alertText += thisHeading.textContent + '\n';
  thisHeading = headings.iterateNext();
}
</pre>
<p>Once we iterate to a node, we have access to all the standard DOM interfaces on that node. After iterating through all the <code>h2</code> elements returned from our expression, any further calls to <code>iterateNext()</code> will return <code>null</code>.
</p><p><br>
</p>
<h3 name="Return_Types">Return Types</h3>
<p>The return value of an XPath expression can either be individual nodes (<a href="#Simple_Types_simple_types">#Simple Types simple types</a>), or a collection of nodes (<a href="#Node-Set_Types_node-set_types">#Node-Set Types node-set types</a>)
</p>
<h4 name="Simple_Types">Simple Types</h4>
<pre class="eval">is not a node set but a simpler type: a floating point number - NUMBER_TYPE, a string - STRING_TYPE, or a boolean value - BOOLEAN_TYPE. For expressions of this type, we still obtain an <code>XPathResult</code> object as a result of our call to <code>document.evaluate()</code> but we must access the <code>numberValue</code>, <code>stringValue</code> or <code>booleanValue</code> properties respectively of the <code>XPathResult</code> object to retrieve our results.
</pre>
<p>A simple example is using the XPath expression <code><a href="en/XPath/Functions/count">count(//p)</a></code> to obtain the number of <code><span class="plain">&lt;p&gt;</span></code> elements in a HTML document:
</p>
<pre>var paragraphCount = document.evaluate('count(//p)', document, null, XPathResult.ANY_TYPE, null).numberValue;
alert('This document contains ' + paragraphCount + ' paragraph elements');
</pre>
<p>Although Javascript allows us to convert the number to a string for display, the XPath interface will not automatically convert the numerical result if the <code>stringValue</code> property is requested, so the following code will '<b><big>not</big>'</b> work:
</p>
<pre>var paragraphCount = document.evaluate('count(//p)', document, null, XPathResult.ANY_TYPE, null).stringValue;
alert('This document contains ' + paragraphCount + ' paragraph elements"');
</pre>
<p>Instead it will return an exception with the code <code>NS_DOM_TYPE_ERROR</code>. It is possible to request a specific type of return value by altering the <code>resultType</code> property. In order to force a string return type, we can pass <code>resultType</code> the constant <code>XPathResult.STRING_TYPE</code>, so the following code will work:
</p>
<pre>var paragraphCount = document.evaluate('count(//p)', document, null, XPathResult.STRING_TYPE, null).stringValue;
alert('This document contains ' + paragraphCount + ' paragraph elements');
</pre>
<h4 name="Node-Set_Types">Node-Set Types</h4>
<p>In the earlier example of matching all the <code>h2</code> headings in a document, the node-set returned was of type <code>UNORDERED_NODE_ITERATOR_TYPE</code>. The XPath interface allows node-sets to be returned in a variety of different ways, there are 3 principal categories of node set types that can be returned:
</p>
<ul><li><b>Iterators</b>: An iterator which allows access to the matched nodes one at at time. The next node can be accessed with the <code>iterateNext()</code> method of the <code>XpathResult</code> object. If the document is modified between iterations it invalidates the iteration and the <code>invalidIteratorState</code> property becomes is set to <code>true</code>.
</li></ul>
<ul><li><b>Snapshots</b>: A static list of matched nodes, which are accessed through the <code>snapshotItem(itemNumber)</code> method of <code>XPathResult</code>, where <code>itemNumber</code> is the index of the node to be retrieved. The number of nodes returned can be accessed through the <code>snapshotLength</code> property. Snapshots do not change with document mutations, so unlike the iterator result the snapshot does not become invalid but it may not correspond to the current document, the nodes may have been moved, it might contain nodes that no longer exist, or an incomplete set of node results.
</li></ul>
<ul><li><b>First Nodes</b>: Only the first found node matching the XPath expression is returned. This may be accessed through the <code>singleNodeValue</code> property of the <code>XPathResult</code> object. This will be <code>null</code> if the node set is empty. For the unordered subtype the single node returned might not be the first in document order.
</li></ul>
<p>For each of these node set types, there are 2 subtypes:
</p>
<ul><li> Ordered: Guaranteed to retain the nodes in the document order.
</li><li> Unordered: May not produce nodes in a particular order.
</li></ul>
<p><span class="comment">===Node-set type Example===   //XXX - want an example with add and remove nodes and probably also with contextNode != document  &lt;pre&gt;  &lt;/pre&gt;  ==Using XPath with XML documents==  The examples so far have been designed to work with HTML documents in which there are no namespaces. In order to use XPath on {{mediawiki.internal('XML documents', "en")}}, we must provide a  namespace resolver function for converting namespace prefixes in the document.</span>
</p>
<h3 name="Appendix">Appendix</h3>
<h4 name="Using_a_User_Defined_Namespace_Resolver">Using a User Defined Namespace Resolver</h4>
<p>This function will need to take namespace prefixes from the <code>xpathExpression</code> and return the <a href="en/URI">URI</a> that corresponds to that prefix. For example, the expression:
</p>
<pre>//xhtml:td/mathml:math
</pre>
<p>will select all <a href="en/MathML">MathML</a> expressions that are the children of (X)HTML table data cell elements. 
</p><p>To associate the <code>mathml:</code> prefix with the namespace URI '<code>http://www.w3.org/1998/Math/MathML</code>' and <code>html:</code> with the URI <code>http://www.w3.org/1999/xhtml</code> we provide a function:
</p>
<pre>function NSResolver( prefix ) 
{
  if ( prefix == 'xhtml' ) 
  {
    return 'http://www.w3.org/1999/xhtml';
  }
  else if ( prefix == 'mathml' ) 
  {
    return 'http://www.w3.org/1998/Math/MathML'
  }
  else
  {
    return null;
  }
}
</pre>
<p>Our call to <code>document.evaluate</code> would then looks like:
</p>
<pre>document.evaluate('//xhtml:td/mathml:math', document, NSResolver, XPathResult.ANY_TYPE, null);
</pre>
<h4 name="XPathResult_Object_Defined_Constants">XPathResult Object Defined Constants</h4>
<table class="standard-table">

<tbody><tr>
<td class="header">Result Type Defined Constant
</td><td class="header">Value
</td><td class="header">Description
</td></tr>

<tr>
<td>ANY_TYPE
</td><td>0
</td><td>A result set containing whatever type naturally results from evaluation of the expression. Note that if the result is a node-set then UNORDERED_NODE_ITERATOR_TYPE is always the resulting type.
</td></tr>

<tr>
<td>NUMBER_TYPE
</td><td>1
</td><td>A result containing a single number. This is useful for example, in an XPath expression using the <code>count()</code> function.
</td></tr>

<tr>
<td>STRING_TYPE
</td><td>2
</td><td>A result containing a single string.
</td></tr>

<tr>
<td>BOOLEAN_TYPE
</td><td>3
</td><td>A result containing a single boolean value. This is useful for example, in an XPath expression using the <code>not()</code> function.
</td></tr>

<tr>
<td>UNORDERED_NODE_ITERATOR_TYPE
</td><td>4
</td><td>A result node-set containing all the nodes matching the expression. The nodes may not necessarily be in the same order that they appear in the document.
</td></tr>

<tr>
<td>ORDERED_NODE_ITERATOR_TYPE
</td><td>5
</td><td>A result node-set containing all the nodes matching the expression. The nodes in the result set are in the same order that they appear in the document.
</td></tr>

<tr>
<td>UNORDERED_NODE_SNAPSHOT_TYPE
</td><td>6
</td><td>A result node-set containing snapshots of all the nodes matching the expression. The nodes may not necessarily be in the same order that they appear in the document.
</td></tr>

<tr>
<td>ORDERED_NODE_SNAPSHOT_TYPE
</td><td>7
</td><td>A result node-set containing snapshots of all the nodes matching the expression. The nodes in the result set are in the same order that they appear in the document.
</td></tr>

<tr>
<td>ANY_UNORDERED_NODE_TYPE
</td><td>8
</td><td>A result node-set containing any single node that matches the expression. The node is not necessarily the first node in the document that matches the expression.
</td></tr>

<tr>
<td>FIRST_ORDERED_NODE_TYPE
</td><td>9
</td><td>A result node-set containing the first node in the document that matches the expression.
</td></tr>
</tbody></table>
<div class="originaldocinfo">
<h2 name="Original_Document_Information"> Original Document Information </h2>
<ul><li> Author(s): James Graham
</li><li> Other Contributors: James Thompson
</li><li> Last Updated Date: 2006-3-18
</li><li> Migrated from <a class="external" href="http://www-xray.ast.cam.ac.uk/~jgraham/mozilla/xpath-tutorial.html">Mozilla XPath Tutorial</a>
</li></ul>
</div>
Revert to this revision