Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access to old, no longer available, feed entries

I am working on a project that requires reliable access to historic feed entries which are not necessarily available in the current feed of the website. I have found several ways to access such data, but none of them give me all the characteristics I need.

Look at this as a brainstorm. I will tell you how much I have found and you can contribute if you have any other ideas.

  1. Google AJAX Feed API - will limit you to 250 items

  2. Unofficial Google Reader API - Perfect but unofficial and therefore unreliable (and perhaps quasi-illegal?). Also, the authentication seems to be tricky.

  3. Spinn3r - Costs a lot of money

  4. Spidering the internet archive at the site of the feed - Lots of complexity, spotty coverage, only useful as a last resort

  5. Yahoo! Feed API or Yahoo! Search BOSS - The first looks more like an aggregator, meaning I'd need a different registration for each feed and the second should give more access to Yahoo's data but I can find no mention of feeds.

  6. (thanks to Lou Franco) Bloglines Sync API - Besides the problem of needing an account and being designed more as an aggregator, it does not have a way to add feeds to the account. So no retrieval of arbitrary feeds. You need to manually add them through the reader first.

  7. Other search engines/blog search/whatever?

This is a really irritating problem as we are talking about semantic information that was once out there, is still (usually) valid, yet is difficult to access reliably, freely and without limits. Anybody know any alternative sources for feed entry goodness?

like image 336
Alexandros Marinos Avatar asked Oct 03 '08 16:10

Alexandros Marinos


2 Answers

Bloglines has an API to sync accounts

http://www.bloglines.com/services/api/sync

You have to make an account, subscribe to the feed you want to download, but then then you can download based on Date, which can be way in the past. Not sure of the terms.

like image 118
Lou Franco Avatar answered Sep 28 '22 04:09

Lou Franco


The best answer I've found so far, is this: Google reader's unofficial API turns out to have a public access point for their feeds, which means there is no authentication needed. Use is as follows:

http://www.google.com/reader/public/atom/feed/{your feed uri here}?n=1000

replace the text in the squigglies (including the squigglies themselves) with the feed URI you're interested in. More information about the precise arguments can be found here:

http://blog.martindoms.com/2009/10/16/using-the-google-reader-api-part-2/

but remember to use the /public/ url if you don't want to mess with authentication

like image 33
Alexandros Marinos Avatar answered Sep 28 '22 05:09

Alexandros Marinos