Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web crawler in ruby [closed]

What is your recommendation of writing a web crawler in Ruby? Any lib better than mechanize?

like image 890
pierrotlefou Avatar asked Nov 09 '10 10:11

pierrotlefou


2 Answers

I'd give a try to anemone. It's simple to use, especially if you have to write a simple crawler. In my opinion, It is well designed too. For example, I wrote a ruby script to search for 404 errors on my sites in a very short time.

like image 86
lucapette Avatar answered Sep 17 '22 20:09

lucapette


If you want just to get pages' content, the simpliest way is to use open-uri functions. They don't require additional gems. You just have to require 'open-uri' and... http://ruby-doc.org/stdlib-2.2.2/libdoc/open-uri/rdoc/OpenURI.html

To parse content you can use Nokogiri or other gems, which also can have, for example, useful XPATH-technology. You can find other parsing libraries just here on SO.

like image 29
Nakilon Avatar answered Sep 20 '22 20:09

Nakilon