According to the feedparser documentation, I can turn an RSS feed into a parsed object like this:
import feedparser
d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
but I can't find anything showing how to go the other way; I'd like to be able do manipulate 'd' and then output the result as XML:
print d.toXML()
but there doesn't seem to be anything in feedparser for going in that direction. Am I going to have to loop through d's various elements, or is there a quicker way?
Appended is a not hugely-elegant, but working solution - it uses feedparser to parse the feed, you can then modify the entries, and it passes the data to PyRSS2Gen. It preserves most of the feed info (the important bits anyway, there are somethings that will need extra conversion, the parsed_feed['feed']['image'] element for example).
I put this together as part of a little feed-processing framework I'm fiddling about with.. It may be of some use (it's pretty short - should be less than 100 lines of code in total when done..)
#!/usr/bin/env python
import datetime
# http://www.feedparser.org/
import feedparser
# http://www.dalkescientific.com/Python/PyRSS2Gen.html
import PyRSS2Gen
# Get the data
parsed_feed = feedparser.parse('http://reddit.com/.rss')
# Modify the parsed_feed data here
items = [
PyRSS2Gen.RSSItem(
title = x.title,
link = x.link,
description = x.summary,
guid = x.link,
pubDate = datetime.datetime(
x.modified_parsed[0],
x.modified_parsed[1],
x.modified_parsed[2],
x.modified_parsed[3],
x.modified_parsed[4],
x.modified_parsed[5])
)
for x in parsed_feed.entries
]
# make the RSS2 object
# Try to grab the title, link, language etc from the orig feed
rss = PyRSS2Gen.RSS2(
title = parsed_feed['feed'].get("title"),
link = parsed_feed['feed'].get("link"),
description = parsed_feed['feed'].get("description"),
language = parsed_feed['feed'].get("language"),
copyright = parsed_feed['feed'].get("copyright"),
managingEditor = parsed_feed['feed'].get("managingEditor"),
webMaster = parsed_feed['feed'].get("webMaster"),
pubDate = parsed_feed['feed'].get("pubDate"),
lastBuildDate = parsed_feed['feed'].get("lastBuildDate"),
categories = parsed_feed['feed'].get("categories"),
generator = parsed_feed['feed'].get("generator"),
docs = parsed_feed['feed'].get("docs"),
items = items
)
print rss.to_xml()
If you're looking to read in an XML feed, modify it and then output it again, there's a page on the main python wiki indicating that the RSS.py library might support what you're after (it reads most RSS and is able to output RSS 1.0). I've not looked at it in much detail though..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With