Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best library for parsing RSS/Atom in Perl?

I notice that XML::RSS::Parser hasn't been updated since 2005. Is this still the recommended library for parsing RSS or Atom? Is there a better one or a better way?

like image 426
xenoterracide Avatar asked Oct 20 '10 01:10

xenoterracide


4 Answers

I'm not sure it's ever been the "recommended library". If I know which kind of feed I need to parse, I use XML::RSS or XML::Atom as appropriate, but if (as is more likely) I just know it's a web feed, I use XML::Feed.

Adding an example of using XML::Feed as requested..

use XML::Feed;

my $feed = XML::Feed->parse(\$string_containing_feed);

foreach ($feed->entries) {
  print $_->title, "\n";
  print $_->content->body, "\n";
}

This is all pretty much copied from the module documentation.

like image 53
Dave Cross Avatar answered Oct 16 '22 06:10

Dave Cross


I actually like to avoid domain-specific XML parsers these days and just use XPath for everything. That way I only have to remember one API. (Unless it's a huge XML, then I'll use an event-based parser like XML::Parser.)

So using XML::XPath, I can grab a bunch of stuff from an RSS file like this:

my $rss = get_rss();
my $xp = XML::XPath->new( xml => $rss );

my $stories = $xp->find( '/rss/channel/item' );

foreach my $story( $stories->get_nodelist ) {
    my $url   = $xp->find( 'link',  $story )->string_value;
    my $title = $xp->find( 'title', $story )->string_value;
    ...
}

Not the prettiest code in the world, but it works.

like image 6
friedo Avatar answered Oct 16 '22 04:10

friedo


If XML::RSS::Parser works for you then use it. I've used XML::Parser to deal with RSS but I had narrow requirements and XML::Parser was already installed.

Just because something has been updated in a few years doesn't mean that it doesn't work anymore; I don't think the various RSS/Atom specs have changed recently so there's no need for the parser to change.

like image 5
mu is too short Avatar answered Oct 16 '22 05:10

mu is too short


There is also a very nice module called XML::FeedPP (see http://search.cpan.org/dist/XML-FeedPP/lib/XML/FeedPP.pm). FeedPP is no so fast but it writen in almost pure Perl and has minimalistic dependencies.

like image 4
ssvda Avatar answered Oct 16 '22 06:10

ssvda