Writing JavaScript for XHTML

  • Revision slug: Writing_JavaScript_for_XHTML
  • Revision title: Writing JavaScript for XHTML
  • Revision id: 69734
  • Created:
  • Creator: Manuel Strehl
  • Is current revision? No
  • Comment /* Solution: The CDATA Trick */ typos

Revision Content

Website authors have started to write now XHTML files instead of HTML 4.01 for about 7 years. But alas, almost no XHTML file viewed over the web is served with the correct MIME type, that is, with application/xhtml+xml! This is for one reason due to a certain browser, that is not capable of XHTML as XML. But it also is also founded in the experience that the JavaScript, authored carefully for HTML, suddenly breaks in an XML environment.

This article shows some of the reasons alongside with strategies to remedy the problems. It will encourage web authors to use more XML features and make their JavaScripts interoperable with real XHTML applications.

To test the following examples locally, use Firefox's extension switch. Just write an ordinary (X)HTML file and save it once as test.html and once as test.xhtml.

Problem: Nothing Works

After switching the MIME type suddenly no inline script works anymore. Even the plain old alert() method is gone. The code looks something like this:

<script type="text/javascript">
   //<!--
   window.alert("Hello World!");
   //-->
 </script>

Solution: The CDATA Trick

This problem usually arises, when inline scripts are included in comments. This was common practice in HTML, to hide the scripts from browsers not capable of JS. In the age of XML comments are what they were intended: Comments. Before processing the file, all comments will be stripped from the document, so enclosing your script in them is like throwing your lunch in a Piranha pool.

In XML you will need a different notation. It looks like this:

<script type="text/javascript">
   //<![CDATA[
   window.alert("Hello World!");
   //]]>
 </script>

What happened? The comments now are gone and the script is enclosed in a so-called CDATA-section (short for character data). This wouldn't be necessary, the script would work without that, as long as no comments are used. The reason for adding this special notation is to savely use < in the script.

A second remedy is very simple: Use only external scripts.

Problem: The DOM changed

The central object in the DOM, the document object, is of type HTMLDocument in HTML, whereas it is an XMLDocument in XML files. This has an especially huge impact on methods JavaScript authors are used to in daily work. Take the document.getElementsByTagName method, for example. This is a DOM 1 method, which means, there are no XML namespaces respected. Take a look at this common snippet:

var headings = document.getElementsByTagName("h1");
for( var i = 0; i < headings.length; i++ ) {
  doSomethingWith( headings[i] );
}

Enter the problem: in XHTML, served as XML, all elements are in the XHTML namespace (remember the xmlns attribute in the html tag?). This means, our plain old DOM 1 method suddenly finds no elements anymore. Bang! Immediately 80% of today's JavaScripts on the web crashed, including our snippet above.

Solution: Use DOM 2 methods

The W3C introduced the DOM 2, addressing the needs of distinguishing namespaces. Perhaps you have seen sometimes before a method like document.getElementsByTagNameNS? The difference is the NS part, meaning, it looks for namespaces. How do we use this method? This is straight forward:

var headings = document.getElementsByTagNameNS("http://www.w3.org/1999/xhtml","h1");
for( var i = 0; i < headings.length; i++ ) {
  doSomethingWith( headings[i] );
}

The only difference is the mentioning of the namespace the element is in. Okay, more letters to type, but you can define shorthands. Then, let's take only DOM 2 methods from now on!

But wait! Now, taking a look in our HTML file, the script breaks again! Remember, in HTML the elements are in no namespace at all! So, what we have to do now is writing a wrapper, that determines, if we are dealing with an HTML or an XML file. Check out this piece of code:

Node.prototype.getHTMLByTagName(tagName) {
  if( document.contentType == "text/html" ) {
    return this.getElementsByTagName(tagName);
  } else {
    return this.getElementsByTagNameNS("http://www.w3.org/1999/xhtml",tagName);
  }
}

What does it do? It extends all nodes with a method getHTMLByTagName, that distinguishes between the content type of the document element. But there is an interoperability issue: For IE, you would not only have to take a look at the document.mimeType property instead, but also cannot easily extend Node objects. So, to write a wrapper, that truely distinguishes between XML and HTML on one hand and different browsers on the other hand is a bit more tricky. We let this over to you as a exercise.

NB: The DOM 1 method getElementsByTgName also exists in XML documents. It will find every element of a given name, that is in no namespace at all. For this reason, AJAX's responseXML is often processed with DOM 1 methods without finding any problems. This is because very little XML sent via HTTPRequest bothers with namespaces.


Problem: My Cookie won't be saved!

We found out already, that the document object in XML files is different from the ones in HTML files. Now we take a look at one property, that is missing in XML files and that we will miss very bad. In XML documents there is no document.cookie. That is, you can write something like

document.cookie = "key=value";

in XML as well, but you will find out, that literally nothing is saved in the cookie storage.

Solution: Use the Storage object

With Firefox 2 there was a new feature enabled, the HTML 5 Storage object. Although this feature is not free of critics, you can use it to bypass the non-existing cookie, if your document is of type XML. Again, you will have to write your own wrapper to respect any given combination of MIME type and browser.

Problem: I Can't Use document.write()

This problem has the same cause as the one above. This method does not exist in XMLDocuments anymore. There are reasons, why this decision was made, one being, that a string of invalid markup will instantly break the whole document.

Solution: Use DOM Methods

Many people avoided DOM methods because of the typing to create one simple element, when document.write() was completely satisfying. Now you can't do this as easily as before. Use DOM methods to create all of your elements, attributes and other nodes. This is XML proof, as long as you keep the namespace problem in focus (e.g., there is a document.createElementNS method).

Now, not to be inhonest, you can still use strings like in document.write(), but it takes a little more effort. This code shows you, how to do it:

var string = '<div xmlns="http://www.w3.org/999/xhtml"><h1>Hello World!</h1></div>';
var parser = new DOMParser();
var documentFragment = parser.parseFromString(string, "text/xml");
body.appendChild(documentFragment); // assuming 'body' is the body element

But be aware, that if your string is not well-formed XML (e.g., you have an & where it should not be), then this method will crash, leaving you with a parser error.

Problem: My Favourite JS Library still Breaks

If you use JavaScript libraries like the famous prototype.js or Yahoo's one, there is bad news for you: As long as the developers don't start to apply the points mentioned above, you won't be able to use them in your XML-XHTML applications.

Two possible ways still are there, but neither is very promissing: Take the library, recode it and publish it or e-mail the developers, e-mail your friends to e-mail the developers and e-mail your customers to e-mail the developers. If they get the hint and are not too annoyed, perhaps they start to implement XML features in their libraries.

I Read About E4X. Now, This Is Perfect, Isn't It?

As a matter of fact, it isn't. E4X is a new method of using and manipulating XML in JavaScript. But, standardized by ECMA, they forgot to implement an interface to let E4X objects interact with DOM objects our document consists of. So, with every advantage E4X has, without a DOM interface you can't use it productively to manipulate your document.

Finally: Content negotiation

Now, how do we decide, when to serve XHTML as XML? We can do this on server side by evaluating the HTTP request header. In PHP, for example, you would write something like this:

if( strpos( $_SERVER['HTTP_ACCEPT'], "application/xhtml+xml" ) ) {
  header( "Content-type: application/xhtml+xml" );
  echo '<?xml version="1.0" ?>'."\n";
} else {
  header( "Content-type: text/html" );
}

This distinction even sends the XML declaration, which is strongly recommended, when the document is an XML file. If it is sent as HTML, an XML declaration would break IE's Doctype switch.

Further Reading

You will find several useful articles in the developer wiki:

DOM 2 methods you will need are:

Revision Source

<p>Website authors have started to write now XHTML files instead of HTML 4.01 for about 7 years. But alas, almost no XHTML file viewed over the web is served with the correct MIME type, that is, with <i>application/xhtml+xml</i>! This is for one reason due to a certain browser, that is not capable of XHTML as XML. But it also is also founded in the experience that the JavaScript, authored carefully for HTML, suddenly breaks in an XML environment.
</p><p>This article shows some of the reasons alongside with strategies to remedy the problems. It will encourage web authors to use more XML features and make their JavaScripts interoperable with real XHTML applications.
</p><p>To test the following examples locally, use <a href="en/XML_in_Mozilla#XHTML">Firefox's extension switch</a>. Just write an ordinary (X)HTML file and save it once as <i>test.html</i> and once as <i>test.xhtml</i>.
</p>
<h3 name="Problem:_Nothing_Works"> Problem: Nothing Works </h3>
<p>After switching the MIME type suddenly no inline script works anymore. Even the plain old <i>alert()</i> method is gone. The code looks something like this:
</p>
<pre class="eval"><span class="plain">&lt;script type="text/javascript"&gt;
   //&lt;!--
   window.alert("Hello World!");
   //--&gt;
 &lt;/script&gt;</span>
</pre>
<h4 name="Solution:_The_CDATA_Trick"> Solution: The CDATA Trick </h4>
<p>This problem usually arises, when inline scripts are included in comments. This was common practice in HTML, to hide the scripts from browsers not capable of JS. In the age of XML comments are what they were intended: Comments. Before processing the file, all comments will be stripped from the document, so enclosing your script in them is like throwing your lunch in a Piranha pool.
</p><p>In XML you will need a different notation. It looks like this:
</p>
<pre class="eval"><span class="plain">&lt;script type="text/javascript"&gt;
   //&lt;![CDATA[
   window.alert("Hello World!");
   //]]&gt;
 &lt;/script&gt;</span>
</pre>
<p>What happened? The comments now are gone and the script is enclosed in a so-called CDATA-section (short for character data). This wouldn't be necessary, the script would work without that, as long as no comments are used. The reason for adding this special notation is to savely use <b>&lt;</b> in the script.
</p><p>A second remedy is very simple: <a href="en/Properly_Using_CSS_and_JavaScript_in_XHTML_Documents">Use only external scripts</a>.
</p>
<h3 name="Problem:_The_DOM_changed"> Problem: The DOM changed </h3>
<p>The central object in the DOM, the <i>document</i> object, is of type <i>HTMLDocument</i> in HTML, whereas it is an <i>XMLDocument</i> in XML files. This has an especially huge impact on methods JavaScript authors are used to in daily work. Take the <i>document.getElementsByTagName</i> method, for example. This is a DOM 1 method, which means, there are no XML namespaces respected. Take a look at this common snippet:
</p>
<pre class="eval">var headings = document.getElementsByTagName("h1");
for( var i = 0; i &lt; headings.length; i++ ) {
  doSomethingWith( headings[i] );
}
</pre>
<p>Enter the problem: in XHTML, served as XML, <b>all</b> elements are in the XHTML namespace (remember the <i>xmlns</i> attribute in the <i>html</i> tag?). This means, our plain old DOM 1 method suddenly finds no elements anymore. <i>Bang!</i> Immediately 80% of today's JavaScripts on the web crashed, including our snippet above.
</p>
<h4 name="Solution:_Use_DOM_2_methods"> Solution: Use DOM 2 methods </h4>
<p>The W3C introduced the DOM 2, addressing the needs of distinguishing namespaces. Perhaps you have seen sometimes before a method like <i>document.getElementsByTagNameNS</i>? The difference is the <b>NS</b> part, meaning, it looks for namespaces. How do we use this method? This is straight forward:
</p>
<pre class="eval">var headings = document.getElementsByTagNameNS(<b>"http://www.w3.org/1999/xhtml"</b>,"h1");
for( var i = 0; i &lt; headings.length; i++ ) {
  doSomethingWith( headings[i] );
}
</pre>
<p>The only difference is the mentioning of the namespace the element is in. Okay, more letters to type, but you can define shorthands. Then, let's take only DOM 2 methods from now on!
</p><p>But wait! Now, taking a look in our HTML file, the script breaks again! Remember, in HTML the elements are in <b>no namespace at all</b>! So, what we have to do now is writing a wrapper, that determines, if we are dealing with an HTML or an XML file. Check out this piece of code:
</p>
<pre class="eval">Node.prototype.getHTMLByTagName(tagName) {
  if( document.contentType == "text/html" ) {
    return this.getElementsByTagName(tagName);
  } else {
    return this.getElementsByTagNameNS("http://www.w3.org/1999/xhtml",tagName);
  }
}
</pre>
<p>What does it do? It extends all nodes with a method <i>getHTMLByTagName</i>, that distinguishes between the content type of the document element. But there is an interoperability issue: For IE, you would not only have to take a look at the document.mimeType property instead, but also cannot easily extend Node objects. So, to write a wrapper, that truely distinguishes between XML and HTML on one hand and different browsers on the other hand is a bit more tricky. We let this over to you as a exercise.
</p><p><i>NB:</i> The DOM 1 method <i>getElementsByTgName</i> also exists in XML documents. It will find every element of a given name, that is in no namespace at all. For this reason, AJAX's responseXML is often processed with DOM 1 methods without finding any problems. This is because very little XML sent via HTTPRequest bothers with namespaces.
</p><p><br>
</p>
<h3 name="Problem:_My_Cookie_won.27t_be_saved.21"> Problem: My Cookie won't be saved! </h3>
<p>We found out already, that the document object in XML files is different from the ones in HTML files. Now we take a look at one property, that is missing in XML files and that we will miss very bad. In XML documents there is no <i>document.cookie</i>. That is, you can write something like
</p>
<pre class="eval">document.cookie = "key=value";
</pre>
<p>in XML as well, but you will find out, that literally nothing is saved in the cookie storage.
</p>
<h4 name="Solution:_Use_the_Storage_object"> Solution: Use the Storage object </h4>
<p>With Firefox 2 there was a new feature enabled, the <a href="en/DOM/Storage">HTML 5 Storage object</a>. Although this feature is not free of critics, you can use it to bypass the non-existing cookie, if your document is of type XML. Again, you will have to write your own wrapper to respect any given combination of MIME type and browser.
</p>
<h3 name="Problem:_I_Can.27t_Use_document.write.28.29"> Problem: I Can't Use <i>document.write()</i> </h3>
<p>This problem has the same cause as the one above. This method does not exist in <i>XMLDocument</i>s anymore. There are reasons, why this decision was made, one being, that a string of invalid markup will instantly break the whole document.
</p>
<h4 name="Solution:_Use_DOM_Methods"> Solution: Use DOM Methods </h4>
<p>Many people avoided DOM methods because of the typing to create one simple element, when <i>document.write()</i> was completely satisfying. Now you can't do this as easily as before. Use DOM methods to create all of your elements, attributes and other nodes. This is XML proof, as long as you keep the namespace problem in focus (e.g., there is a <i>document.createElementNS</i> method).
</p><p>Now, not to be inhonest, you can still use strings like in document.write(), but it takes a little more effort. This code shows you, how to do it:
</p>
<pre class="eval">var string = '<span class="plain">&lt;div xmlns="http://www.w3.org/999/xhtml"&gt;&lt;h1&gt;Hello World!&lt;/h1&gt;&lt;/div&gt;</span>';
var parser = new DOMParser();
var documentFragment = parser.parseFromString(string, "text/xml");
body.appendChild(documentFragment); // assuming 'body' is the body element
</pre>
<p>But be aware, that if your string is not well-formed XML (e.g., you have an &amp; where it should not be), then this method will crash, leaving you with a parser error.
</p>
<h3 name="Problem:_My_Favourite_JS_Library_still_Breaks"> Problem: My Favourite JS Library still Breaks </h3>
<p>If you use JavaScript libraries like the famous prototype.js or Yahoo's one, there is bad news for you: As long as the developers don't start to apply the points mentioned above, you won't be able to use them in your XML-XHTML applications.
</p><p>Two possible ways still are there, but neither is very promissing: Take the library, recode it and publish it or e-mail the developers, e-mail your friends to e-mail the developers and e-mail your customers to e-mail the developers. If they get the hint and are not too annoyed, perhaps they start to implement XML features in their libraries.
</p>
<h3 name="I_Read_About_E4X._Now.2C_This_Is_Perfect.2C_Isn.27t_It.3F"> I Read About E4X. Now, This Is Perfect, Isn't It? </h3>
<p>As a matter of fact, it isn't. <a href="en/E4X">E4X</a> is a new method of using and manipulating XML in JavaScript. But, standardized by ECMA, they forgot to implement an interface to let E4X objects interact with DOM objects our document consists of. So, with every advantage E4X has, without a DOM interface you can't use it productively to manipulate your document.
</p>
<h3 name="Finally:_Content_negotiation"> Finally: Content negotiation </h3>
<p>Now, how do we decide, when to serve XHTML as XML? We can do this on server side by evaluating the HTTP request header. In PHP, for example, you would write something like this:
</p>
<pre class="eval">if( strpos( $_SERVER['HTTP_ACCEPT'], "application/xhtml+xml" ) ) {
  header( "Content-type: application/xhtml+xml" );
  echo '&lt;?xml version="1.0" ?&gt;'."\n";
} else {
  header( "Content-type: text/html" );
}
</pre>
<p>This distinction even sends the XML declaration, which is strongly recommended, when the document is an XML file. If it is sent as HTML, an XML declaration would break IE's Doctype switch.
</p>
<h3 name="Further_Reading"> Further Reading </h3>
<p>You will find several useful articles in the developer wiki:
</p>
<ul><li> <a href="en/XML_in_Mozilla">XML in Mozilla</a>
</li><li> <a href="en/DOM">DOM</a>
</li><li> <a href="en/XML_Introduction">XML Introduction</a>
</li><li> <a href="en/XML_Extras">XML Extras</a>
</li></ul>
<p>DOM 2 methods you will need are:
</p>
<ul><li> <a href="en/DOM/document.createElementNS">DOM:document.createElementNS</a>
</li><li> <a href="en/DOM/document.createNSResolver">DOM:document.createNSResolver</a>
</li><li> <a href="en/DOM/document.getElementsByTagNameNS">DOM:document.getElementsByTagNameNS</a>
</li></ul>
Revert to this revision