Processing XML with E4X

Warning: E4X is obselete. It's been disabled by default for chrome in Firefox 17, and completely removed in Firefox 21. Use DOMParser/DOMSerializer or a non-native JXON algorithm instead.

First introduced in JavaScript 1.6, E4X introduces a native XML object to the JavaScript language, and adds syntax for embedding literal XML documents in JavaScript code.

A full definition of E4X can be found in the Ecma-357 specification. This chapter provides a practical overview of the language; it is not a complete reference.

Compatibility issues

Prior to widespread browser support for the <script> element, it was common for JavaScript embedded in a page to be surrounded by HTML comment tags to prevent <script> unaware browsers from displaying JavaScript code to the user. This practice is no longer necessary, but remains in some legacy code. For backwards compatibility, E4X defaults to ignoring comments and CDATA sections. You can add an e4x=1 argument to your <script> tag to disable this restriction:

<script type="text/javascript;e4x=1">
...
</script>

Creating an XML object

E4X offers two principal ways of creating an XML object. The first is to pass a string to the XML constructor:

 var languages = new XML('<languages type="dynamic"><lang>JavaScript</lang><lang>Python</lang></languages>');

The second is to embed the XML directly in your script, as an XML literal:

 var languages = <languages type="dynamic">
   <lang>JavaScript</lang>
   <lang>Python</lang>
 </languages>;

In both cases, the resulting object will be an E4X XML object, which provides convenient syntax for both accessing and updating the encapsulated data.

While the XML object looks and behaves in a similar way to a regular JavaScript object, the two are not the same thing. E4X introduces new syntax that only works with E4X XML objects. The syntax is designed to be familiar to JavaScript programmers, but E4X does not provide a direct mapping from XML to native JavaScript objects; just the illusion of one.

It is possible to interpolate variables into an XML literal to create an element name (or to create content).

var h = 'html';
var text = "Here's some text";
var doc = <{h}><body>{text}</body></{h}>;
alert(doc.toXMLString());
// Gives 
<html>
  <body>Here's some text</body>
</html>

Working with attributes

XML literal syntax has a significant advantage over the XML constructor when you need to create markup dynamically. With E4X it is easy to embed dynamic values in markup. Variables and expressions can be used to create attribute values by simply wrapping them with braces ({}) and omitting quotation marks that would normally go around an attribute value, as the following example illustrates:

 var a = 2;
 var b = <foo bar={a}>"hi"</foo>;

Upon execution the variable is evaluated and quotes are automatically added where appropriate. The preceding example would result in an XML object which looks like this: <foo bar="2">"hi"</foo>.

In attribute substitution, quotation marks are escaped as &quot; while apostrophes are handled normally.

var b = 'He said "Don\'t go there."';
var el = <foo a={b}/>;
alert(el.toXMLString());
// Gives: <foo a="He said &quot;Don't go there.&quot;"/>

Less than and ampersand signs are escaped into their entity equivalents. Since a greater than sign is not escaped, it is possible to get an XML error if the CDATA closing sequence (]]>) is included.

It is not possible to directly interpolate variables amidst other literal (or variable) attribute content, however (e.g., bar="a{var1}{var2}"). One must instead either calculate the variable with a JavaScript expression (e.g., bar={'a'+var1+var2}), define a new variable before the element literal which includes the full interpolation and then include that variable or retrieve the attribute after the literal to alter it (see below).

While one can interpolate attribute names as well as attribute values:

var a = 'att';
var b = <b {a}='value'/>;
alert(b);
// Gives:
<b att="value"/>

...one cannot interpolate a whole expression at once (e.g., <b {a}>.)

After executing the above example, the variable languages references an XML object corresponding to the <languages> node in the XML document. This node has one attribute, type, which can be accessed and updated in a number of ways:

 alert(languages.@type); // Alerts "dynamic"
 languages.@type = "agile";
 alert(languages.@type); // Alerts "agile"
 alert(languages.toString());
 /* Alerts:
   <languages type="agile"><lang>JavaScript</lang><lang>Python</lang></languages>
 */

Note that if one wishes to make comparisons of retrieved attributes with other strings, it is necessary to convert the attribute first, even though the attribute may be converted to a string when used in other contexts (such as insertion into a textbox).

if (languages.@type.toString() === 'agile') {
...
}

or, simply:

if (languages.@type == 'agile') {
...
}

Working with XML objects

XML objects provide a number of methods for inspecting and updating their contents. They support JavaScript's regular dot and [] notation, but instead of accessing object properties E4X overloads these operators to access the element's children:

var person = <person>
  <name>Bob Smith</name>
  <likes>
    <os>Linux</os>
    <browser>Firefox</browser>
    <language>JavaScript</language>
    <language>Python</language>
  </likes>
</person>;

alert(person.name); // Bob Smith
alert(person['name']); // Bob Smith
alert(person.likes.browser); // Firefox
alert(person['likes'].browser); // Firefox

If you access something with more than one matching element, you get back an XMLList:

alert(person.likes.language.length()); // 2

As with the DOM, * can be used to access all child nodes:

alert(person.likes.*.length()); // 4

While the . operator accesses direct children of the given node, the .. operator accesses all children no matter how deeply nested:

alert(person..*.length()); // 11

The length() method here returns 11 because both elements and text nodes are included in the resulting XMLList.

Objects representing XML elements provide a number of useful methods, some of which are illustrated below: TODO: Add all of the methods to the JavaScript reference, link from here

alert(person.name.text()) // Bob Smith

var xml = person.name.toXMLString(); // A string containing XML

var personCopy = person.copy(); // A deep copy of the XML object

var child = person.child(1); // The second child node; in this case the <likes> element

Working with XMLLists

In addition to the XML object, E4X introduces an XMLList object. XMLList is used to represent an ordered collection of XML objects; for example, a list of elements. Continuing the above example, we can access an XMLList of the <lang> elements in the page as follows:

 var langs = languages.lang;

XMLList provides a length() method which can be used to find the number of contained elements:

 alert(languages.lang.length());

Note that unlike JavaScript arrays length is a method, not a property, and must be called using length().

We can iterate through the matching elements like so:

 for (var i = 0; i < languages.lang.length(); i++) {
     alert(languages.lang[i].toString());
 }

Here we are using identical syntax to that used to access numbered items in an array. Despite these similarities to regular arrays, XMLList does not support Array methods such as forEach, and Array generics such as Array.forEach() are not compatible with XMLList objects.

We can also use the for each...in statement introduced in JavaScript 1.6 as part of JavaScript's E4X support:

 for each (var lang in languages.lang) {
     alert(lang);
 }

for each...in can also be used with regular JavaScript objects to iterate over the values (as opposed to the keys) contained in the object. As with for...in, using it with arrays is strongly discouraged.

It is possible to create an XMLList using XML literal syntax without needing to create a well-formed XML document, using the following syntax:

 var xmllist = <>
   <lang>JavaScript</lang>
   <lang>Python</lang>
 </>;

The += operator can be used to append new elements to an XMLList within a document:

 languages.lang += <lang>Ruby</lang>;

Note that unlike node lists returned by regular DOM methods, XMLLists are static and are not automatically updated to reflect changes in the DOM. If you create an XMLList as a subset of an existing XML object and then modify the original XML object, the XMLList will not reflect those changes; you will need to re-create it to get the most recent updates:

 var languages = <languages>
   <lang>JavaScript</lang>
   <lang>Python</lang>
 </languages>;
 
 var lang = languages.lang;
 alert(lang.length()); // Alerts 2
 
 languages.lang += <lang>Ruby</lang>;
 alert(lang.length()); // Still alerts 2
 
 lang = languages.lang; // Re-create the XMLList
 alert(lang.length()); // Alerts 3

Searching and filtering

E4X provides special operators for selecting nodes within a document that match specific criteria. These filter operations are specified using an expression contained in parentheses:

var html = <html>
  <p id="p1">First paragraph</p>
  <p id="p2">Second paragraph</p>
</html>;

alert(html.p.(@id == "p1")); // Alerts "First paragraph"

Nodes matching the path before the expression (in this case the paragraph elements) are added to the scope chain before the expression is evaluated, as if they had been specified using the with statement.

Consequently, filters can also run against the value of a single node contained within the current element:

var people = <people>
  <person>
    <name>Bob</name>
    <age>32</age>
  </person>
  <person>
    <name>Joe</name>
    <age>46</age>
  </person>
</people>;

alert(people.person.(name == "Joe").age); // Alerts 46

Filter expressions can even use JavaScript functions:

function over40(i) {
    return i > 40;
}

alert(people.person.(over40(parseInt(age))).name); // Alerts Joe

Handling namespaces

E4X is fully namespace aware. Any XML object that represents a node or attribute provides a name() method which returns a QName object, allowing easy inspection of namespaced elements.

Default

default xml namespace = "http://www.w3.org/1999/xhtml";
// No need now to specify a namespace in the html tag
var xhtml = <html><head><title></title></head><body>
            <p>text</p></body></html>;
alert(xhtml.head); // No need to specify a namespace on subelements here either

Non-default

var xhtml = <html xmlns="http://www.w3.org/1999/xhtml">
	<head>
		<title>Embedded SVG demo</title>
	</head>
	<body>
		<h1>Embedded SVG demo</h1>
		<svg xmlns="http://www.w3.org/2000/svg" 
			viewBox="0 0 100 100">
			<circle cx="50"
				cy="50"
				r="20"
				stroke="orange"
				stroke-width="2px"
				fill="yellow" />
		</svg>
	</body>
</html>;

alert(xhtml.name().localName); // Alerts "html"
alert(xhtml.name().uri); // Alerts "http://www.w3.org/1999/xhtml"

To access elements that are within a non-default namespace, first create a Namespace object encapsulating the URI for that namespace:

var svgns = new Namespace('http://www.w3.org/2000/svg');

This can now be used in E4X queries by using namespace::localName in place of a normal element specifier:

var svg = xhtml..svgns::svg;
alert(svg); // Shows the <svg> portion of the document

Using Generators/Iterators with E4X

As of JavaScript 1.7, it is possible to use generators and iterators, giving more options for traversing E4X.

In a manner akin to DOM tree walkers, we can define our own walkers for E4X. While the following is already achievable by iterating an E4X object with for each...in, it demonstrates how a more customized one could be created.

function xmlChildWalker (xml) {
    var i = 0;
    var child = xml.*[0];
    while (child != undefined) {
        yield child;
        child = xml.*[++i];
    }
    yield false;
}

var a = <a><b/><c/></a>;
var xcw = xmlChildWalker(a);

var child;
while ((child = xcw.next()) !== false) {
    alert(child.toXMLString()); // "<b/>" then "<c/>"
}

See also

Document Tags and Contributors

Last updated by: user01,