<content:encoded> is ridiculous
It’s a never-approved draft of an extension to a long-dead branch of RSS history, used mostly to solve a non-existent problem that it doesn’t even solve.
Draft: started • Tagged /rss
You often see <content:encoded> on items in RSS feeds.
It will almost always have the same content as its <description> sibling.
This is dumb, serving absolutely no purpose. The W3C Feed Validator has a rule: Ensure description precedes content:encoded. Because “some consumers will simply pick up the last such element”. (Personal opinion: that warning is probably long obsolete, and there are also probably some consumers that simply pick the first such element.)
The first explanation for its purpose: description is plain text
This is a popular explanation.
Until MDN deleted its RSS stuff, there was a page of uncertain age: “Why RSS Content Module is Popular - Including HTML Contents” (sources: Wayback Machine archive of MDN from 2015, in about 2019 it got archived; also on devdoc.net and UDN archive).
The <content:encoded> element is the reason that the RSS Content Module is popular. This element is used to include an HTML <description>.
Between 2015-10-12 and 2017-01-18 someone partially corrected this (probably p3k on 2016-05-13), but the rest of the article was unchanged.
The two different purposes
You’ll find two explanations justifying <content:encoded> on RSS 2.0 feeds:
<description> is supposed to be plain text, <content:encoded> is HTML.
This has been used extensively as an argument (e.g. https://udn.realityripple.com/docs/Archive/RSS/Article/Why_RSS_Content_Module_is_Popular_-_Including_HTML_Contents), but I’m not sure if it was ever actually true.
<description> was only plain text for a short time;
it was introduced so in 0.91 (1999-07), people started putting in HTML anyway,
and 0.92 (2000-12) allowed it to contain HTML.
RDF Site Summary 1.0 (2000-12) had <description> but didn’t specify its format, but no one used that spec.
I’m not sure this argument was ever actually true,
and if it ever was, I think it was probably obsolete before 2004.
I think everything just started assuming it was HTML,
possibly adding the equivalent of white-space: pre-wrap if it looked like maybe it was text.
Damages occasional content, but not much.
Providing the full content, in addition to the summary in <description>.
This, I believe, is the real answer.
The content:encoded element can be used in conjunction with the description element to provide an item's full content along with a shorter summary. Under this approach, the complete text of the item is presented in content:encoded and the summary in description.
— RSS Best Practices Profile
In practice, almost no one actually wants to store both:
their feeds will contain either a summary or the full content.
Whichever it is, it will be stored in <description>,
and if there’s a <content:encoded>, it will be the same content,
making it useless.
It’s a relic of the RDF branch of history
For detail, see my history of RSS.
But the key points are:
A Netscape developer’s initial vision was to use RDF.
This proved impractical at the time,
but due to time pressure it was still released as RSS (RDF Site Summary) 0.9,
even though the only piece of RDF in it was the root element’s qualified name.
Netscape and everyone else quickly gave up on the RDF direction, with RSS (Rich Site Summary) 0.91, RSS (it’s just a name) 0.91, and eventually RSS (Really Simple Syndication) 2.0.
A handful of people thought the RDF thing had been a good idea.
Between the releases of RSS 0.91 and 0.92,
they forked RSS 0.9 and doubled down on RDF.
They called themselves “RSS-DEV Working Group”.
They were not affiliated with anything like IETF or W3C.
When they released RSS (RDF Site Summary) 1.0,
all it really added was confusion and trouble.
Look into the namespace URL, https://web.resource.org/rss/1.0/modules/content/, and see its name: RDF Site Summary 1.0 Modules: Content
This is a relic of the RDF fork of RSS.
It was never designed for mainline RSS.
It’s a draft syntax that was never even approved
<content:encoded> is part of the Updated Syntax, of which is said:
This section is a draft and has not yet been approved by the WG.
What was approved was version 1.0, which is completely different,
very complicated, very RDF, more powerful in a way few actually care about.
Its semantics are badly underspecified (and it doesn’t even say it’s HTML!)
An element whose contents are the entity-encoded or CDATA-escaped version of the content of the item.
“The content of the item” isn’t well-defined, but I think we all know what it means. But from the description of the module:
A module for the actual content of websites, in multiple formats.
Notice something odd? In multiple formats.
Hang on, where are the multiple formats?
In version 1.0 of the module,
the one that was actually approved.
The one no one uses because it’s painfully complicated (RDF!) and because almost no one actually cares about including multiple formats in the feed.
The one with a <content:items>
that contains an <rdf:Bag>
that contains one or more <rdf:li>s
each with an <content:item>
containing a <content:format>,
probably an <rdf:value>,
and maybe a <content:encoding>,
and a bunch of RDF attributes like rdf:resource,
and altogether too many moving parts.
But what everyone uses is the draft version 2.0, with only one element, <content:encoded>.
I suspect a lot of people don’t realise what “encoded” means here:
I suspect a lot of people assume it means “interpret it as HTML, not text”;
but it actually means “encoded as XML character data”.
The format is never specified, and there’s no way of specifying it.
I believe that one of the reasons <content:encoded> gained popularity is because there were still enough people using <description> as plain text, so that <content:encoded> conveyed unambiguous HTML.
It is ironic that the actual spec never says it’s HTML.
As people use it, the namespace and local name are back to front
Arguable. Reflect more on this.
It’s often used for something other than the content of the item?
Hypothesis entirely unchecked. But I get the impression from one or two speccy sorts of things I’ve read that podcasts are abusing it. But I haven’t surveyed actual podcast feeds at all yet.
In the early days, RSS <description> was text.
Things are made worse by the fact that . e.g. until RSS 2.0, `<description>` didn’t specify if it could contain HTML or not, but people started using it. `<description>` and `<content:encoded>` are semantically *identical*… in theory.
The `content` XML namespace is for “RDF Site Summary 1.0 Modules: Content”. But *no one* implements version 1.0, but rather the 2.0 draft which has never been approved by “the WG”, and I honestly don’t know who that is as they don’t explain, and the link is dead. is completely incompatible. It’s not clear from the labelling whether the “1.0” in the title pertains to RSS (and fun times with what the acronym RSS stands for), or to Content—that is, I honestly don’t know whether . is different from the “1.0” and “2.0”
`content:encoded` is on a 1.0. So basically people are using a *draft* RSS 1.0 extension feature with RSS 2.0, despite the fact that RSS 2.0 decisively obviates that specific feature. Argh! It’s all a confusing mess. RSS is *organic* in the worst way imaginable.
(RSS 1.0 went all-in on RDF, including in name, claiming to be completely backwards-compatible with 0.90, but in fact is 100% incompatible by virtue of existing in a different namespace.)
The RSS 1.0 spec can’t even load its stylesheets, since the CSS file is served with MIME type text/plain.
These days, I would say the only. My advice: just use Atom
If you want to distinguish between a description and the full content,
Atom has your back: <summary> and <content>.
If you want to make it clear you’ve got HTML encoded as text,
Atom has your back: type="html".
RSS is clearly hopeless, if in practice people rely on pulling in bits and pieces of other formats.
<content:encoded>? That’s from an extension to a completely different format, RDF Site Summary. <atom:link>? Hah, Atom envy. (I really don’t understand why that one is broadly recommended. Feel I can’t think of anything that would benefit from it.)
So I say just use Atom and skip the mess that is RSS.