Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I turn an RSS feed back into RSS?

Tags:

python

rss

According to the feedparser documentation, I can turn an RSS feed into a parsed object like this:

import feedparser
d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')

but I can't find anything showing how to go the other way; I'd like to be able do manipulate 'd' and then output the result as XML:

print d.toXML()

but there doesn't seem to be anything in feedparser for going in that direction. Am I going to have to loop through d's various elements, or is there a quicker way?

like image 523
Simon Avatar asked Oct 08 '08 08:10

Simon


2 Answers

Appended is a not hugely-elegant, but working solution - it uses feedparser to parse the feed, you can then modify the entries, and it passes the data to PyRSS2Gen. It preserves most of the feed info (the important bits anyway, there are somethings that will need extra conversion, the parsed_feed['feed']['image'] element for example).

I put this together as part of a little feed-processing framework I'm fiddling about with.. It may be of some use (it's pretty short - should be less than 100 lines of code in total when done..)

#!/usr/bin/env python
import datetime

# http://www.feedparser.org/
import feedparser
# http://www.dalkescientific.com/Python/PyRSS2Gen.html
import PyRSS2Gen

# Get the data
parsed_feed = feedparser.parse('http://reddit.com/.rss')

# Modify the parsed_feed data here

items = [
    PyRSS2Gen.RSSItem(
        title = x.title,
        link = x.link,
        description = x.summary,
        guid = x.link,
        pubDate = datetime.datetime(
            x.modified_parsed[0],
            x.modified_parsed[1],
            x.modified_parsed[2],
            x.modified_parsed[3],
            x.modified_parsed[4],
            x.modified_parsed[5])
        )

    for x in parsed_feed.entries
]

# make the RSS2 object
# Try to grab the title, link, language etc from the orig feed

rss = PyRSS2Gen.RSS2(
    title = parsed_feed['feed'].get("title"),
    link = parsed_feed['feed'].get("link"),
    description = parsed_feed['feed'].get("description"),

    language = parsed_feed['feed'].get("language"),
    copyright = parsed_feed['feed'].get("copyright"),
    managingEditor = parsed_feed['feed'].get("managingEditor"),
    webMaster = parsed_feed['feed'].get("webMaster"),
    pubDate = parsed_feed['feed'].get("pubDate"),
    lastBuildDate = parsed_feed['feed'].get("lastBuildDate"),

    categories = parsed_feed['feed'].get("categories"),
    generator = parsed_feed['feed'].get("generator"),
    docs = parsed_feed['feed'].get("docs"),

    items = items
)


print rss.to_xml()
like image 78
dbr Avatar answered Sep 19 '22 06:09

dbr


If you're looking to read in an XML feed, modify it and then output it again, there's a page on the main python wiki indicating that the RSS.py library might support what you're after (it reads most RSS and is able to output RSS 1.0). I've not looked at it in much detail though..

like image 40
Jon Cage Avatar answered Sep 18 '22 06:09

Jon Cage