Thursday, May 05, 2005

The killer aggregator feature waiting to be written

Four words: Google News meets Bloglines.

That should convey the idea to anybody who would understand what I'm saying, but I'll elaborate anyway. Basically, it would be far preferable to have a feed aggregator that presented blog entries from your subscription list grouped by content, not just by feed or by date. This would avoid the situation, presently all-too-common, wherein you see the same link seven times --- once via Atrios, and then again via all the people who read Atrios. It would also give you a way of reading the posts about each topic as a coherent thread of conversation, instead of a pointillistic series of textual bites.

It wouldn't even be that hard to hack up a prototype. A large fraction of blog posts --- especially the sort that you'd want to cluster --- consist one or two links, some quoted text, and commentary; the outgoing links and quoted text (easy to detect: hyperlinks and <blockquote>'d regions) provide strong hooks for the clustering/threading algorithm to grab onto. A competent Python hacker, armed with Mark Pilgrim's feed parser, could probably write a web-based aggregator with this feature in a week. (And then, of course, spend another week tweaking the algorithm to catch important special cases and such.)

Alas, I do not have a week to kill, so the world will have to wait longer for the magnificence of clustered feed aggregation. Unless somebody's already done it...

1 comment:

  1. SharpReader collects blog entries that link to a common page in a threaded view, like you suggest. But, it doesn't do everything you talked about, and it's Windows-only.