JavaScript で XPath を使用する

この記事は翻訳が完了していません。 この記事の翻訳にご協力ください

このドキュメントでは、JavaScript の内部、拡張機能、そして Web サイトから XPath を使用するためのインターフェイスについて説明します。Mozilla は DOM 3 XPath をかなりの量実装しており、XPath 式は HTML と XML ドキュメントの両方に対して実行できます。

XPath を使用するための主なインターフェースは、Document オブジェクトの evaluate 関数です。

document.evaluate

このメソッドは、XML ベースのドキュメント (HTML ドキュメントを含む) に対して XPath 式を評価し、XPathResult オブジェクトを返します。このメソッドの既存のドキュメントは document.evaluate にありますが、今のところ我々が必要としているものには乏しいです。

var xpathResult = document.evaluate( xpathExpression, contextNode, namespaceResolver, resultType, result );

Parameters

evaluate 関数は合計5つのパラメータを取ります。

  • xpathExpression: 評価される XPath 式を含む文字列
  • contextNode: xpathExpression が評価されるべきドキュメント内のノード。Document ノードが最も一般的に使用されます
  • namespaceResolver: xpathExpression 内に含まれる名前空間接頭辞を渡す関数で、その接頭辞に関連付けられた名前空間 URI を表す文字列を返します。これにより、XPath 式で使用されている接頭辞とドキュメント内で使用されている可能性のある異なる接頭辞との変換が可能になります。この関数は、以下のいずれかの方法で利用できます
    • XPathEvaluator オブジェクトの createNSResolver メソッドを使用して作成します。事実上、これを使用する必要があります
    • nullです。これは、HTML ドキュメントや名前空間プレフィックスが使用されていない場合に使用することができます。xpathExpressionに 名前空間プレフィックスが含まれている場合、NAMESPACE_ERR というコードで DOMException がスローされることに注意してください
    • カスタムのユーザ定義関数。詳細は、付録の ユーザー定義名前空間リゾルバの使用法 を参照してください
  • resultType: 評価の結果として返される結果の型を指定する定数です。最も一般的に渡される定数は XPathResult.ANY_TYPE で、これは XPath 式の結果を最も自然な型として返します。付録には、利用可能な定数の完全なリストを含むセクションがあります。これらの定数は以下の「戻り値の型の指定」のセクションで説明されています
  • result: 既存の XPathResult オブジェクトを指定すると、そのオブジェクトが再利用されて結果が返されます。null を指定すると、新しい XPathResult オブジェクトが作成されます

Return Value

resultType パラメータで指定された型の XPathResult オブジェクトを返します。XPathResult インターフェースはここで定義されています。

Implementing a Default Namespace Resolver

document オブジェクトの createNSResolver メソッドを使用して名前空間リゾルバを作成します。

var nsResolver = document.createNSResolver( contextNode.ownerDocument == null ? contextNode.documentElement : contextNode.ownerDocument.documentElement );

Or alternatively by using the <code>createNSResolver</code> method of a <code>XPathEvaluator</code> object. <pre> var xpEvaluator = new XPathEvaluator(); var nsResolver = xpEvaluator.createNSResolver( contextNode.ownerDocument == null ? contextNode.documentElement : contextNode.ownerDocument.documentElement ); </pre> そして、namespaceResolver パラメータとして nsResolver 変数である document.evaluate を渡します。

注意: XPath は、ヌル名前空間の要素にのみマッチするように、接頭辞のない QNames を定義しています。XPath では、通常の要素参照 (例: xmlns='http://www.w3.org/1999/xhtml'p[@id='_myid']) に適用されるデフォルトの名前空間を拾う方法はありません。NULL ではない名前空間のデフォルト要素にマッチさせるには、['namespace-uri()='http://www.w3.org/1999/xhtml' and name()='p' and @id='_myid'] のような形式を使用して特定の要素を参照する必要があります (このアプローチは、名前空間がわからない動的な XPath の場合にうまく機能します)。後者の方法を取りたい場合は、ユーザ定義の名前空間リゾルバを作成する方法を参照してください。

Notes

任意の DOM ノードを名前空間を解決するように適応させ、 XPath 式をドキュメント内で出現したノードのコンテキストからの相対評価を簡単に行えるようにします。このアダプタは、ノード上の DOM Level 3 メソッド lookupNamespaceURI と同様に動作し、 lookupNamespaceURI が呼び出された時点でのノードの階層内で利用可能な現在の情報を使用して、指定したプレフィックスから namespaceURI を解決します。また、暗黙の xml 接頭辞も正しく解決します。

Specifying the Return Type

The returned variable xpathResult from document.evaluate can either be composed of individual nodes (simple types), or a collection of nodes (node-set types).

Simple Types

resultType に希望する結果タイプがどちらかに指定されている場合。

  • NUMBER_TYPE - a double
  • STRING_TYPE - 文字列
  • BOOLEAN_TYPE - 真偽値

XPathResult オブジェクトの以下のプロパティにそれぞれアクセスして、式の戻り値を取得します。

  • numberValue
  • stringValue
  • booleanValue
Example

The following uses the XPath expression count(//p) to obtain the number of <p> elements in an HTML document:

var paragraphCount = document.evaluate( 'count(//p)', document, null, XPathResult.ANY_TYPE, null );

alert( 'This document contains ' + paragraphCount.numberValue + ' paragraph elements' );

Although JavaScript allows us to convert the number to a string for display, the XPath interface will not automatically convert the numerical result if the stringValue property is requested, so the following code will not work:

var paragraphCount = document.evaluate('count(//p)', document, null, XPathResult.ANY_TYPE, null );

alert( 'This document contains ' + paragraphCount.stringValue + ' paragraph elements' );

Instead, it will return an exception with the code NS_DOM_TYPE_ERROR.

Node-Set Types

The XPathResult object allows node-sets to be returned in 3 principal different types:

Iterators

When the specified result type in the resultType parameter is either:

  • UNORDERED_NODE_ITERATOR_TYPE
  • ORDERED_NODE_ITERATOR_TYPE

The XPathResult object returned is a node-set of matched nodes which will behave as an iterator, allowing us to access the individual nodes contained by using the iterateNext() method of the XPathResult.

Once we have iterated over all of the individual matched nodes, iterateNext() will return null.

Note however, that if the document is mutated (the document tree is modified) between iterations that will invalidate the iteration and the invalidIteratorState property of XPathResult is set to true, and a NS_ERROR_DOM_INVALID_STATE_ERR exception is thrown.

Iterator Example
var iterator = document.evaluate('//phoneNumber', documentNode, null, XPathResult.UNORDERED_NODE_ITERATOR_TYPE, null );

try {
  var thisNode = iterator.iterateNext();
  
  while (thisNode) {
    alert( thisNode.textContent );
    thisNode = iterator.iterateNext();
  }	
}
catch (e) {
  alert( 'Error: Document tree modified during iteration ' + e );
}
Snapshots

When the specified result type in the resultType parameter is either:

  • UNORDERED_NODE_SNAPSHOT_TYPE
  • ORDERED_NODE_SNAPSHOT_TYPE

The XPathResult object returned is a static node-set of matched nodes, which allows us to access each node through the snapshotItem(itemNumber) method of the XPathResult object, where itemNumber is the index of the node to be retrieved. The total number of nodes contained can be accessed through the snapshotLength property.

Snapshots do not change with document mutations, so unlike the iterators, the snapshot does not become invalid, but it may not correspond to the current document, for example, the nodes may have been moved, it might contain nodes that no longer exist, or new nodes could have been added.

Snapshot Example
var nodesSnapshot = document.evaluate('//phoneNumber', documentNode, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null );

for ( var i=0 ; i < nodesSnapshot.snapshotLength; i++ )
{
  alert( nodesSnapshot.snapshotItem(i).textContent );
}
First Node

When the specified result type in the resultType parameter is either:

  • ANY_UNORDERED_NODE_TYPE
  • FIRST_ORDERED_NODE_TYPE

The XPathResult object returned is only the first found node that matched the XPath expression. This can be accessed through the singleNodeValue property of the XPathResult object. This will be null if the node set is empty.

Note that, for the unordered subtype the single node returned might not be the first in document order, but for the ordered subtype you are guaranteed to get the first matched node in the document order.

First Node Example
var firstPhoneNumber = document.evaluate('//phoneNumber', documentNode, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null );

alert( 'The first phone number found is ' + firstPhoneNumber.singleNodeValue.textContent );

The ANY_TYPE Constant

When the result type in the resultType parameter is specified as ANY_TYPE, the XPathResult object returned, will be whatever type that naturally results from the evaluation of the expression.

It could be any of the simple types (NUMBER_TYPE, STRING_TYPE, BOOLEAN_TYPE), but, if the returned result type is a node-set then it will only be an UNORDERED_NODE_ITERATOR_TYPE.

To determine that type after evaluation, we use the resultType property of the XPathResult object. The constant values of this property are defined in the appendix. None Yet =====Any_Type Example===== <pre> </pre>

Examples

Within an HTML Document

The following code is intended to be placed in any JavaScript fragment within or linked to the HTML document against which the XPath expression is to be evaluated.

To extract all the <h2> heading elements in an HTML document using XPath, the xpathExpression is simply '//h2'. Where, // is the Recursive Descent Operator that matches elements with the nodeName h2 anywhere in the document tree. The full code for this is: link to introductory xpath doc

var headings = document.evaluate('//h2', document, null, XPathResult.ANY_TYPE, null );

Notice that, since HTML does not have namespaces, we have passed null for the namespaceResolver parameter.

Since we wish to search over the entire document for the headings, we have used the document object itself as the contextNode.

The result of this expression is an XPathResult object. If we wish to know the type of result returned, we may evaluate the resultType property of the returned object. In this case, that will evaluate to 4, an UNORDERED_NODE_ITERATOR_TYPE. This is the default return type when the result of the XPath expression is a node set. It provides access to a single node at a time and may not return nodes in a particular order. To access the returned nodes, we use the iterateNext() method of the returned object:

var thisHeading = headings.iterateNext();

var alertText = 'Level 2 headings in this document are:\n'

while (thisHeading) {
  alertText += thisHeading.textContent + '\n';
  thisHeading = headings.iterateNext();
}

Once we iterate to a node, we have access to all the standard DOM interfaces on that node. After iterating through all the h2 elements returned from our expression, any further calls to iterateNext() will return null.

Evaluating against an XML document within an Extension

The following uses an XML document located at chrome://yourextension/content/peopleDB.xml as an example.

<?xml version="1.0"?>
<people xmlns:xul = "http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul" >
  <person>
	<name first="george" last="bush" />
	<address street="1600 pennsylvania avenue" city="washington" country="usa"/>
	<phoneNumber>202-456-1111</phoneNumber>
  </person>
  <person>
	<name first="tony" last="blair" />
	<address street="10 downing street" city="london" country="uk"/>
	<phoneNumber>020 7925 0918</phoneNumber>
  </person>
</people>

To make the contents of the XML document available within the extension, we create an XMLHttpRequest object to load the document synchronously, the variable xmlDoc will contain the document as an XMLDocument object against which we can use the evaluate method

JavaScript used in the extensions xul/js documents.

var req = new XMLHttpRequest();

req.open("GET", "chrome://yourextension/content/peopleDB.xml", false); 
req.send(null);

var xmlDoc = req.responseXML;		

var nsResolver = xmlDoc.createNSResolver( xmlDoc.ownerDocument == null ? xmlDoc.documentElement : xmlDoc.ownerDocument.documentElement);

var personIterator = xmlDoc.evaluate('//person', xmlDoc, nsResolver, XPathResult.ANY_TYPE, null );

Note

When the XPathResult object is not defined, the constants can be retrieved in privileged code using Components.interfaces.nsIDOMXPathResult.ANY_TYPE (CI.nsIDOMXPathResult). Similarly, an XPathEvaluator can be created using:

Components.classes["@mozilla.org/dom/xpath-evaluator;1"].createInstance(Components.interfaces.nsIDOMXPathEvaluator)

Appendix

Implementing a User Defined Namespace Resolver

This is an example for illustration only. This function will need to take namespace prefixes from the xpathExpression and return the URI that corresponds to that prefix. For example, the expression:

'//xhtml:td/mathml:math'

will select all MathML expressions that are the children of (X)HTML table data cell elements.

In order to associate the 'mathml:' prefix with the namespace URI 'http://www.w3.org/1998/Math/MathML' and 'xhtml:' with the URI 'http://www.w3.org/1999/xhtml' we provide a function:

function nsResolver(prefix) {
  var ns = {
    'xhtml' : 'http://www.w3.org/1999/xhtml',
    'mathml': 'http://www.w3.org/1998/Math/MathML'
  };
  return ns[prefix] || null;
}

Our call to document.evaluate would then looks like:

document.evaluate( '//xhtml:td/mathml:math', document, nsResolver, XPathResult.ANY_TYPE, null );

Implementing a default namespace for XML documents

As noted in the Implementing a Default Namespace Resolver previously, the default resolver does not handle the default namespace for XML documents. For example with this document:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <entry />
    <entry />
    <entry />
</feed>

doc.evaluate('//entry', doc, nsResolver, XPathResult.ANY_TYPE, null) will return an empty set, where nsResolver is the resolver returned by createNSResolver. Passing a null resolver doesn't work any better, either.

One possible workaround is to create a custom resolver that returns the correct default namespace (the Atom namespace in this case). Note that you still have to use some namespace prefix in your XPath expression, so that the resolver function will be able to change it to your required namespace. E.g.:

function resolver() {
    return 'http://www.w3.org/2005/Atom';
}
doc.evaluate('//myns:entry', doc, resolver, XPathResult.ANY_TYPE, null)

Note that a more complex resolver will be required if the document uses multiple namespaces.

An approach which might work better (and allow namespaces not to be known ahead of time) is described in the next section.

Using XPath functions to reference elements with a default namespace

Another approach to match default elements in a non-null namespace (and one which works well for dynamic XPath expressions where the namespaces might not be known), involves referring to a particular element using a form such as [namespace-uri()='http://www.w3.org/1999/xhtml' and name()='p' and @id='_myid']. This circumvents the problem of an XPath query not being able to detect the default namespace on a regularly labeled element.

Getting specifically namespaced elements and attributes regardless of prefix

If one wishes to provide flexibility in namespaces (as they are intended) by not necessarily requiring a particular prefix to be used when finding a namespaced element or attribute, one must use special techniques.

While one can adapt the approach in the above section to test for namespaced elements regardless of the prefix chosen (using local-name() in combination with namespace-uri() instead of name()), a more challenging situation occurs, however, if one wishes to grab an element with a particular namespaced attribute in a predicate (given the absence of implementation-independent variables in XPath 1.0).

For example, one might try (incorrectly) to grab an element with a namespaced attribute as follows: var xpathlink = someElements[local-name(@*)="href" and namespace-uri(@*)='http://www.w3.org/1999/xlink'];

This could inadvertently grab some elements if one of its attributes existed that had a local name of "href", but it was a different attribute which had the targeted (XLink) namespace (instead of @href).

In order to accurately grab elements with the XLink @href attribute (without also being confined to predefined prefixes in a namespace resolver), one could obtain them as follows:

var xpathEls = 'someElements[@*[local-name() = "href" and namespace-uri() = "http://www.w3.org/1999/xlink"]]'; // Grabs elements with any single attribute that has both the local name 'href' and the XLink namespace
var thislevel = xml.evaluate(xpathEls, xml, null, XPathResult.ANY_TYPE, null);
var thisitemEl = thislevel.iterateNext();

XPathResult Defined Constants

Result Type Defined Constant Value Description
ANY_TYPE 0 A result set containing whatever type naturally results from the evaluation of the expression. Note that if the result is a node-set then UNORDERED_NODE_ITERATOR_TYPE is always the resulting type.
NUMBER_TYPE 1 A result containing a single number. This is useful for example, in an XPath expression using the count() function.
STRING_TYPE 2 A result containing a single string.
BOOLEAN_TYPE 3 A result containing a single boolean value. This is useful for example, in an XPath expression using the not() function.
UNORDERED_NODE_ITERATOR_TYPE 4 A result node-set containing all the nodes matching the expression. The nodes may not necessarily be in the same order that they appear in the document.
ORDERED_NODE_ITERATOR_TYPE 5 A result node-set containing all the nodes matching the expression. The nodes in the result set are in the same order that they appear in the document.
UNORDERED_NODE_SNAPSHOT_TYPE 6 A result node-set containing snapshots of all the nodes matching the expression. The nodes may not necessarily be in the same order that they appear in the document.
ORDERED_NODE_SNAPSHOT_TYPE 7 A result node-set containing snapshots of all the nodes matching the expression. The nodes in the result set are in the same order that they appear in the document.
ANY_UNORDERED_NODE_TYPE 8 A result node-set containing any single node that matches the expression. The node is not necessarily the first node in the document that matches the expression.
FIRST_ORDERED_NODE_TYPE 9 A result node-set containing the first node in the document that matches the expression.

See also

Original Document Information

  • Based Upon Original Document Mozilla XPath Tutorial
  • Original Source Author: James Graham.
  • Other Contributors: James Thompson.
  • Last Updated Date: 2006-3-25.