Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High-performance RSS/Atom parsing with Ruby on Rails

I need to parse thousands of feeds and performance is an essential requirement. Do you have any suggestions?

Thanks in advance!

like image 316
collimarco Avatar asked Feb 14 '09 13:02

collimarco


3 Answers

I haven't tried it, but I read about Feedzirra recently (it claims to be built for performance) :-

Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the taf2-curb gem for faster http gets, and libxml through nokogiri and sax-machine for faster parsing.

like image 53
James Mead Avatar answered Oct 05 '22 07:10

James Mead


You can use RFeedParser, a Ruby-port of (famous) Python Universal FeedParser. It's based on Hpricot, and it's really fast and easy to use.

http://rfeedparser.rubyforge.org/

An example:

require 'rubygems'
require 'rfeedparser'
require 'open-uri'

feed = FeedParser::parse(open('http://feeds.feedburner.com/engadget'))

feed.entries.each do |entry|
  puts entry.title
end
like image 24
Héctor Vergara Avatar answered Oct 05 '22 07:10

Héctor Vergara


When all you have is a hammer, everything looks like a nail. Consider a solution other than Ruby for this. Though I love Ruby and Rails and would not part with them for web development or perhaps for a domain specific language, I prefer heavy data lifting of the type you describe be performed in Java, or perhaps Python or even C++.

Given that the destination of this parsed data is likely a database it can act as the common point between the Rails portion of your solution and the other language portion. Then you're using the best tool to solve each of your problems and the result is likely easier to work on and truly meets your requirements.

If speed is truly of the essence, why add an additional constraint on there and say, "Oh, it's only of the essence as long as I get to use Ruby."

like image 43
John Munsch Avatar answered Oct 05 '22 07:10

John Munsch