Sunday, April 24, 2005

Xanga feeds suck ass

Warning: obsessive web standard geek ranting follows. Skip if this sounds boring to you.

My unparalleled array of procrastination techniques includes intermittently hacking on a home-grown news aggregator for KDE. One of my friends uses Xanga, and I've gotten sick of reading his posts in a web browser. I figured that in the year 2005, a major blog host like Xanga would have some kind of XML-based feed for their users' readers, so I'd just subscribe to that.

If you look at a Xanga blog, you'll see no links to a syndication feed (no RSS, no Atom, no XML), but some Googling reveals that Xanga does publish an undocumented feed for every user. For any URL

http://www.xanga.com/home.aspx?user=username

there exists a corresponding URL

http://www.xanga.com/rss.aspx?user=username

where something that resembles an RSS 0.91 feed resides. But the feed's busted, in two different ways:

  • The <item> tags are not beneath the <channel> tag, as they're supposed to be, but beneath the toplevel <rss> tag, as they would be in a RSS 0.9 feed. This is a bona fide validation error in violation of the RSS 0.91 spec.
  • All Xanga posts are marked with a date, and optionally can be marked with a title. It would be natural to use each entry's date and title as, you know, its date and title in the feed. Instead, Xanga feeds use the post date as the title! To pile further silliness upon silliness, the date isn't written using any standard date format like RFC822 or ISO 8601, nor is it even formatted YYYY/MM/DD hh:mm:ss (which would at least make alphabetic sorting equivalent to chronological sorting). Instead, it's formatted using MM/DD/YYYY hh:mm:ss tt.

    Omitting the date and providing junk as the title are technically legal within the RSS 0.91/0.92/2.0 standards (which are sucky standards to begin with, but that's a whine for another day). However, doing this can be pretty annoying for any RSS processor that wants to do anything more sophisticated than list the headlines in some random order.

To compensate for these problems, I added two goofy hacks to my aggregator. First, in RSS 0.91/0.92/2.0 feeds, I hunt for <item> tags at toplevel when they're absent under <channel>. Second, I try to parse a date from the title when no date is provided. Now, anybody who's hacked on a web browser will no doubt laugh at this, because web browser developers have to deal with the heinously deformed HTML that people throw on their web pages, and the above looks trivial in comparison. But the point of XML syndication was that aggregators weren't supposed to have to do this sort of thing, which is why Xanga's brokenness is irksome. It's especially annoying because both of the above problems could probably be fixed by a programmer at Xanga in about fifteen minutes, thus saving news aggregator authors the world over many hours of collective time.

On the other hand, in all likelihood, Xanga doesn't really want people to aggregate their users' posts externally. They'd much rather you sign up for Xanga, and use their built-in subscription system. Which, I suppose, is fine, as long as all your friends use Xangas and nothing else. Grrr. I hate software that tries to lock you into its gated community.

Incidentally, LiveJournal, which is built on a similar business model, provides well-formed feeds in a variety of flavors.

2 comments: