Why RSS Content Module is Popular - Including HTML Contents

This is an archived page. It's not actively maintained.


RSS has long had the <description> element that can be used to include the contents of an <item>. For example, you could use it to include the entire contents of a blog post; or just a summary of it. However, the RSS <description> element is only suppose to be used to includeplain text data. This obviously limits you. And since many peoplewrite in HTML information and formatting is lost with the RSS <description> element.

However, it has become common practice to putXML escaped HTML data in it. (Even through you are not suppose to.) For example, if your blog post was:

    This is <b>bold</b>.

then the <description> would be:

   <description>This is &lt;b&gt;bold&lt;/b&gt;.</description>

Note that the "<" has been turned into "&lt;". And the ">" has been turned into "&gt;". This greatly bloats the size of contents but is necessary since certain characters are no allowed in XML.

XML does however have the CDATA convention. You can put (almost) whatever you want in a CDATA section. Including "<" and ">" without having to escape them. Using the example above we'd have:

   <description><![CDATA[This is <b>bold</b>.]]></description>

This helps reduce the bloat. However, the <description> is NOT suppose to be used for any of this. It is only suppose to be used to includeplain text. But RSS still leaves us without a way to include HTML contents. The RSS Content Module fills this gap.

NOTE: Do not put anything butplain text into the RSS <description> element. Although it has become common practice to use the RSS <description> element and put non-plain text data in it. It is not actually allowed. The RSS 2.0 specification clearly states that “entity-encoded HTML is allowed“ and even provides examples showing exactly the syntax above (using CDATA and unencoded HTML). The wording of this note should be reconsidered.

An example using the most popular element of the RSS Content Module is shown below:

   <?xml version="1.0"?>

   <rss version="2.0"

           <description>An RSS Example with Slash</description>
           <lastBuildDate>Sun, 15 May 2005 13:02:08 -0500</lastBuildDate>

               <title>A Link in Here</title>
               <pubDate>Sun, 15 May 2005 13:02:08 -0500</pubDate>
               <content:encoded><![CDATA[This is a <a href="http://example.com/">link</a>.]]></content:encoded>

               <title>Some Italics HTML</title>
               <pubDate>Sun, 15 May 2005 10:55:12 -0500</pubDate>
               <content:encoded><![CDATA[This is <i>italics</i>.]]></content:encoded>

               <title>Some Bold HTML</title>
               <pubDate>Sun, 15 May 2005 08:14:11 -0500</pubDate>
               <content:encoded><![CDATA[This is <b>bold</b>.]]></content:encoded>


The <content:encoded> element is the reason that the RSS Content Module is popular. This element is used to include an HTML <description>.

NOTE: Strictly speaking, the RSS Content Module and <content:encoded> are not quite being used correctly. It requires intertwinment with RDF's XML serialization. But this is how it has become common practice to use it.