Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python RSS Parser that also handles FeedBurner

I was in the middle of writing a Python parser script for RSS feeds. I'm using feedparser, however, I'm stuck on parsing feeds from FeedBurner. Who needs FeedBurner nowadays? Anyways..

For example, I couldn't find ways to parse

http://feeds.wired.com/wired/index

http://feeds2.feedburner.com/ziffdavis/pcmag

When I put those into the feedparser library, don't seem to work. Tried putting ?fmt=xml or ?format=xml at the end of the urls, but still didn't get in xml format.

Do I need to use html parser such as BeautifulSoup to parse FeedBurner feeds? Preferably, is there a python public parser or aggregator script that handles this already?

Any tip or help will be greatly appreciated.

like image 448
DavidL Avatar asked Apr 19 '11 21:04

DavidL


People also ask

What is FeedBurner feed?

FeedBurner is a web feed management service primarily for monetizing RSS feeds, primarily by inserting targeted advertisements into them. It was founded in 2004 and acquired by Google in 2007.

Which libraries can be used to load RSS feeds into a Python application?

We will be using the Feedparser python library for parsing the RSS feed of the blog. It is quite a popular library for parsing blog feeds.


1 Answers

It's possible you have version issue or you're using the API incorrectly -- it would help to see your error message. For example, the following works with Python 2.7 and feedparser 5.0.1:

>>> import feedparser
>>> url = 'http://feeds2.feedburner.com/ziffdavis/pcmag'
>>> d = feedparser.parse(url)
>>> d.feed.title
u'PCMag.com: New Product Reviews'
>>> d.feed.link
u'http://www.pcmag.com'
>>> d.feed.subtitle
u"First Look At New Products From PCMag.com including Lab Tests, Ratings, Editor's and User's Reviews."
>>> len(d['entries'])
30
>>> d['entries'][0]['title']
u'Canon Color imageClass MF9280cdn'

And with the other URL:

>>> url = 'http://feeds.wired.com/wired/index'
>>> d = feedparser.parse(url)
>>> d.feed.title
u'Wired Top Stories'
>>> d.feed.link
u'http://www.wired.com/rss/index.xml'
>>> d.feed.subtitle
u'Top Stories<img src="http://www.wired.com/rss_views/index.gif" />'
>>> len(d['entries'])
30
>>> d['entries'][0]['title']
u'Heart of Dorkness: LARPing Goes Haywire in <em>Wild Hunt</em>'
like image 160
ars Avatar answered Oct 06 '22 00:10

ars