Summary: Document validation is the single most important tool an author has available. Validating not only makes sure your documents are well-formed, but makes them more robust and ready for the future. Get the details on common validation errors and how to fix your documents so you can avoid them. Like a nation or a house, a page divided against itself cannot stand-- not in standards-compliant browsers, anyway. Every page has a structure, and it turns out that if you aren't careful with your construction methods, the structure will be weakened, flawed, and potentially dangerous. If you've ever loaded up a page in Opera or Netscape 6 or Internet Explorer and had it look totally mangled, odds are that you've inadvertently built a shaky structure.
Imagine building a house on a foundation of sand, or with rubber support beams. Most people wouldn't even bother, and anyone who did shouldn't be surprised by huge cracks in the walls, wildly uneven flooring, or even total collapse of the structure. Yet many authors are shocked to discover that their pages fall apart in recent browsers. The usual reaction is that "the page was fine before!" which is exactly like saying "my rubber-column house didn't collapse on the same day it was built!" Perhaps not, but it was always in danger of falling over.
So how does one ensure a good, solid Web house? Well-structured markup. A clean document structure is absolutely essential to ensuring that your pages will behave in browsers both present and future. Fortunately, fixing up a page's structure after it's been built is a lot easier and less expensive than trying to correct structural flaws in a house! In fact, there are HTML validators out there that can help you identify the problems and quickly correct them. We highly recommend the World Wide Web Consortium's HTML Validator-- not only because it's provided by the same people who are responsible for the HTML and XHTML specifications, but also because most of its error messages provide a link to an explanation of what the error means. Eventually, of course, you'll recognize what each error message means without having to look up the explanation, but when you're starting out these help files are invaluable.
Your goal is simple: to bring your page to a state where it doesn't generate any errors at all. For bonus points, you could try to eliminate any warnings as well, but the important thing is to avoid having errors. There are, practically speaking, two general kinds of errors:
- Warnings about elements, which are the most serious and can really mangle a page if left uncorrected. For example, an error like "element '
TD' not allowed here", which implies that you either have a
TDoutside of a table element, or else the validator thinks you do. Either way it's a major problem, and finding out why should be a top priority. An element error is equivalent to a contractor telling you that he left some critical support beams out of your house.
- Warnings about attributes, which are less serious since most browsers will ignore any attribute they don't understand. This is not to say that attribute errors can be ignored, but they are generally less of a concern than element errors.
As you fix your markup to remove one error, you may find that you generate more-- or that suddenly several other errors go away. For example, if you add a missing end-table tag (</table>) to a document, you might fix every "element not allowed here" error that followed. In any case, the goal of every author should be to have no errors at all of either kind.
DOCTYPE and Validation
When you validate your document, you have the option to pick which Document Type Definition (DTD) you want to use as the standard. There are many options available, from HTML 2.0 up to the most recent standard available (it was XHTML 1.1 when we wrote this article). If you want your pages to work in today's browsers, then the best choice is a recent DTD. Given the generally backwards-compatible nature of HTML and XHTML, validating against a recent DTD should mean you'll be all right if older browsers drop by.
Rather than picking a DTD from the provided list, you can also place a
DOCTYPE element at the top of your document, thus marking it as using a specific DTD. Let's say you wanted to use the HTML 4.0 Strict DTD. In that case, the very first line of your document (even before the
<html> tag) should be:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
Once you've added this element to the top of your document, then you can use the Document Type option "(specified inline)" and the W3C validator will use the DTD you've declared in your document to validate the markup.
It is also the case that recent browsers (Netscape 6, Explorer 5 for Macintosh, Explorer 6 for Windows) will make use of the
DOCTYPE element to determine the "rendering mode" you want browser to use when displaying your document. Generally speaking, any "transitional" or "loose" DTD, or even a lack of a
DOCTYPE, will cause the browsers to use a rendering mode that emulates legacy browser behavior. "Strict" DTDs, on the other hand, will switch browsers into a standards-compliant rendering mode. This is an easy way for authors to decide how they want browsers to handle their markup. The Apple Developer Connection has an article called "DOCTYPE Explained" that covers this territory in more detail; note that Internet Explorer 6 for Windows also supports the "DOCTYPE switching" described in the article.
There are a few errors that authors will likely see many, many times as they validate pages. There are also a few things that a validator might not catch (software is generally as perfect as the humans who write it). Here are a few of the most common errors and pitfalls to avoid.
Forgetting Important Attributes
If you get an attribute-related error, it's very likely going to tell you that you forgot to include a required attribute. These include:
typeattribute for the elements
altattribute for the elements
summaryattribute for the element
The latter two attribute are important for accessibility reasons, as their inclusion assists users who are using text-only or audio browsers. The first attribute we mention,
type, is critical for forward compatibility. As an example, many browsers (including Netscape 6) will ignore any
STYLE element that has no
type attribute, which has the usually unwanted effect of disabling the entire stylesheet.
A related situation is that the strict DOCTYPE for HTML and XHTML does not permit the attribute
type is the only way to mark what kind of script is being used. Thus, if you have a script that starts like this:
...then the validator is quite likely to throw an error. You can fix this by modifying the element to read:
Besides the potential problems centered around the
language attribute, there are a few other ways in which scripts can cause you trouble when validating your HTML.
If your script contains any HTML tags inside string values, then make sure to escape the forward-slash symbol. For example, you need to write
var docEle = "<html><\/html>" (note the boldfaced character) in order to prevent validation problems. This is a good practice in any case.
You should also enclose the contents of your
SCRIPT element in an HTML comment. This is often done for both scripts and
STYLE elements, so you may not encounter this problem. The usual way this is done looks something like this:
Improper Nesting of Elements
Over the years, authors have developed a number of tricks that get the effects they want with a minimum of typing, and which avoid certain display effects. Unfortunately, most of these are based on wholly invalid markup and will cause a validator to choke. They'll also lead to display and functionality problems in standards-compliant browsers like Netscape 6 and Internet Explorer 6 (in "strict" mode), so they need to be fixed anyway.
One very common example is wrapping a
FONT element around one or more paragraphs, tables, or other block-level elements. As it happens,
FONT is an inline element, and therefore cannot contain block-level elements. So the following markup is structurally incorrect:
<font color="red"> <p>Hey, paragraphs can't be inside font elements!</p> </font>
It's exactly the same if you wrap a
FONT element around a table. If you must color all of the text in your table, and you feel you must use
FONT to do it, then you'll have to put the font elements inside each cell of the table. Of course, CSS makes this a lot easier:
<table style="color: red;">
On a related note, some authors like to avoid the "white space" that the
FORM element introduces inside table cells by doing something like this:
<table> <form action="script.cgi" method="get"> <tr><td>(...form widgets here...)</td></tr> </form> </table>
That will trigger an error because you can't put
FORM inside a table but outside a table cell. You could wrap the
form element around the entire table, or put the form into the table cell and use CSS to set its margins to zero-- but in that case the entire form would have to be placed within that single table cell. If you're using a table to lay out your form, then you need to wrap it around the whole table, or around an entire section of the document if that's feasible.
Inconsistent Case in Class and ID Values
Despite the fact that HTML has been historically case-insensitive, values in modern HTML and XHTML (as well as XML) are quite case-sensitive. This includes the names of class and ID identifiers. Thus,
ExternalLink is not the same as
externalLINK or even
externallink. Standards-compliant browsers such as Netscape 6 enforce the case sensitivity of class and ID names. However, the HTML validator does not check case in values against other instances of the same values, either in the document or in any associated scripts or stylesheets, and so will not catch any inconsistencies that might lead to trouble in page display. For more information on this point, please see the Tech Note "Case Sensitivity in
Although it may seem picky, it is important to be sure that you format your HTML comments correctly. The correct form of an HTML comment is:
<!-- comment -->
That's two dashes at either end, not three as some authors like to include. In general, you should avoid any sequence of dashes within a comment, and stick to the allowed pair of dashes to help mark the beginning and end of the comment. (See HTML 4.01, section 3.2.4 for more information.)
Because the ampersand character (
&) is reserved for marking character entities, authors should never use raw ampersands in their HTML source-- and that includes ampersands inside URLs! Thus, any URL that needs an ampersand should be written like this:
Each instance of
& will be translated by a Web browser into an ampersand, without triggering validation warnings.
Attribute Value Presence and Quotation
If you're validating against an XHTML DOCTYPE, then all of your attributes must have values, and all of these values must be enclosed in quotation marks. You must also close every element you open, so in those cases where there is no close tag, the end of the element should include a forward-slash. These are requirements of XHTML (and with XML-based languages in general), and so the validator will flag any instance where you do not follow these rules. One example of valid XHTML markup that will differ noticeably from historical HTML:
<input type="checkbox" checked="checked" name="prefSys" value="MacOS" />
Note the addition of a (quoted) value to
checked and the slash at the end of the tag. Without these additions, this markup fragment would not be valid XHTML.
Although it may seem like more work at first, validating your markup now will pay off handsomely in saved time and effort later. Not only will your documents stand a much better chance of being properly displayed in all current and future browsers, but it will be much easier to maintain your documents, or even to convert them from HTML to another markup language such as XML.
Although the ideal goal is to have pages that generate no validation errors and no warnings, your primary concern should be the elimination of actual errors. Similarly, you should be more concerned about element errors than about attribute errors, although you really can't afford to ignore either kind. Once you've cleaned things up so that you no longer get errors, then you can turn to the task of styling the document and feel confident that the page will display in just about any known browser, as well as any decent browser to come.
Original Document Information
- Author(s): Eric A. Meyer, Netscape Communications
- Last Updated Date: Published 05 Mar 2001
- Copyright Information: Copyright © 2001-2003 Netscape. All rights reserved.
- Note: This reprinted article was originally part of the DevEdge site.