Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if an RSS feed has been updated in Python?

I am using the feedparser library in Python to get the various details from an RSS feed. Suppose I have pulled out 25 headlines titles from an RSS feed of a news channel. After an hour I run the feedparser command again to get the latest list of the titles of the 25 new headlines. The list might or not be updated the second time I run the feedparser command.

Some of the headlines might be same and some might be new. I need to be able to check whether there has been an update in any of the news headlines with the headlines that was pulled out the hour earlier. Only the new headlines must be pushed into a database. This is to avoid duplicate getting dumped into the database.

The code looks like below:

import feedparser
d = feedparser.parse('www.news.example.xml')
for item in d.entries:
    hndlr.write(item.title)  #data being dumped into a database

I need to be able to run the above code every hour and check if there was any update in the headlines (title). And if there was any change with the data extracted the hour earlier, only the new data should be dumped into the database.

like image 626
user1452759 Avatar asked Jan 10 '13 11:01

user1452759


People also ask

How are RSS feeds updated?

It's All Thanks to an Aggregator The RSS aggregator checks websites for new content automatically. It immediately pulls that content over to your feed reader so you don't have to go and check each website individually to find new content.

Why is RSS feed not updating?

If an RSS/MRSS feed works when you start up the unit or publish a new presentation, but fails to update at other times, the problem is most likely with the RSS update frequency setting. Go to File > Presentation Properties > Data Feeds and check the Update Interval: It may be set to 12 hours, 24 hours, or even "Once".

How do you check RSS feed is working?

To check an RSS feed's validity, you can use an RSS validator, such as the one at http://feedvalidator.org/. To validate your RSS feed, all you have to do is enter the URL of your feed into the text field (Figure 3.35) and click the Validate button.


2 Answers

Each feed item has an identifier, in item.id. Track those, together with their .updated (or .updated_parsed) entry, to check for new items.

So, see if you already have seen the item (via item.id) or if it has been updated since the last time you checked (via item.updated or item.updated_parsed).

Do make sure you take advantage of the feedparser E-Tag support to check for changed feed contents though. This will only save you from downloading feeds with no new items; you still need to detect items have been added or updated when you get a fresh new copy of the feed.

like image 140
Martijn Pieters Avatar answered Oct 05 '22 06:10

Martijn Pieters


For "good" feeds you can use ETag and last-modfied-since mechanism, it's described here http://www.kbcafe.com/rss/rssfeedstate.html

But some servers doesn't support it, so you need to simply check post dates and ids and see, do you have such posts in your DB or not.

like image 31
cleg Avatar answered Oct 05 '22 07:10

cleg