Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to programmatically determine whether an RSS feed is a full feed or a partial feed

Tags:

rss

I would need to programmatically determine whether an RSS feed exposes the full content of its articles or just extracts of them. How would you do it?

like image 737
pmurillo Avatar asked Feb 06 '09 16:02

pmurillo


2 Answers

Look for a link at the end that says "More", "Continued", "Full article", "..." or similar. Unless you want to follow every link on the page and look for the text from the feed plus extra perhaps.

like image 56
Garry Shutler Avatar answered Oct 09 '22 04:10

Garry Shutler


I don't think there is a very clean way of doing this, but here are two "hacky" ones:

I'd parse the RSS's text, and look for any links coming out of it. Granted, there could be multiple links there (some to other blog posts), but if you focus on the last one, and try to come up with a few heuristic words for the title of the link (i.e. "more", "read full", etc), you should be able to get a lot of them. For more confidence, you can only look at the links that point back to the original blog.

A more rigorous method would have you following all the links and trying to compare if the RSS fragment is a subset of the page that comes back, or if there is a substantial overlap. This may not help whenever the site uses a true summary as opposed to fragment of the full post though.

like image 39
Jean Barmash Avatar answered Oct 09 '22 04:10

Jean Barmash