Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why should Ruby not be used to create a spider

In Episode 78 of the Joel & Jeff podcast one of the Doctype / Litmus guys states that you would never want to build a spider in ruby. Would anyone like to guess at his reasoning for this?

like image 525
Ben Avatar asked Jan 08 '10 22:01

Ben


2 Answers

Just how fast does a crawler need to be, anyhow? It depends upon whether you're crawling the whole web on a tight schedule, or gathering data from a few dozen pages on one web site.

With Ruby and the nokogiri library, I can read this page and parse it in 0.01 seconds. Using xpath to extract data from the parsed page, I can turn all of the data into domain specific objects in 0.16 seconds. All 223 rows.

I am running into fewer and fewer problems where the traditional constraints (cpu/memory/disk) matter. This is an age of plenty. Where resources are not a constraint, don't ask "what's better for the machine." Ask "what's better for the human?"

like image 73
Wayne Conrad Avatar answered Nov 11 '22 03:11

Wayne Conrad


In my opinion it's just a matter of scale. If you're writing a simple scraper for your own personal use or just something that will run on a single machine a couple of times a day, then you should choose something that involves less code/effort/maintenance pains. Whether that's ruby is a different question (I'd pick Groovy over Ruby for this task => better threading + very convenient XML parsing). If, on the other hand, you're scraping terabytes of data per day, then throughput of your application is probably more important than shorter development time.

BTW, anyone that says that you would never want to use some technology in some context or another is most probably wrong.

like image 44
psyho Avatar answered Nov 11 '22 03:11

psyho