RSS: Error-prone

Daniel Pope

I subscribe to only about a dozen RSS or Atom feeds, but more than half of them suffer from one problem or another.

Intermittently dumping a dozen duplicate posts.
Dumping a dozen duplicate posts on every refresh.
Duplicating the most recent post on every refresh.
Double-escaping HTML entities, so I see “, ”, … and such like in post names.
XML syntax errors causing total feed outage until some improperly encoded post drops off the feed.
<pre> code snippets that have lost their formatting.
And, of course, the occasional snippet of HTML that doesn't work as intended when removed from the context of the original HTML document and embedded in RSS.

I often have to search for Pipes to get a useful feed, which is a consequence of the way RSS specifies only a data format, not an obligation on producers, an architectural flaw I've discussed before.

But quite aside from this, it seems that a significant proportion of feeds aren't implemented properly.

Obviously we can blame developers for bugs, but the design of RSS may well be a contributing factor. The process of encapsulating HTML fragments in XML is not as straightforward as it looks. The requirement for a unique ID for each post at first glance does not look onerous. But does the ID correspond to the specific version of a post? Or does it correspond to the current version, however it may have changed since it was first published?

RSS may be useful, but it should also just work, and it doesn't. Developers and standardistas alike should start thinking why.

Comments