Why should Ruby not be used to create a spider

Question

In Episode 78 of the Joel & Jeff podcast one of the Doctype / Litmus guys states that you would never want to build a spider in ruby. Would anyone like to guess at his reasoning for this?

Wayne Conrad · Accepted Answer

Just how fast does a crawler need to be, anyhow? It depends upon whether you're crawling the whole web on a tight schedule, or gathering data from a few dozen pages on one web site.

With Ruby and the nokogiri library, I can read this page and parse it in 0.01 seconds. Using xpath to extract data from the parsed page, I can turn all of the data into domain specific objects in 0.16 seconds. All 223 rows.

I am running into fewer and fewer problems where the traditional constraints (cpu/memory/disk) matter. This is an age of plenty. Where resources are not a constraint, don't ask "what's better for the machine." Ask "what's better for the human?"

psyho · Answer

In my opinion it's just a matter of scale. If you're writing a simple scraper for your own personal use or just something that will run on a single machine a couple of times a day, then you should choose something that involves less code/effort/maintenance pains. Whether that's ruby is a different question (I'd pick Groovy over Ruby for this task => better threading + very convenient XML parsing). If, on the other hand, you're scraping terabytes of data per day, then throughput of your application is probably more important than shorter development time.

BTW, anyone that says that you would never want to use some technology in some context or another is most probably wrong.

Why should Ruby not be used to create a spider

Tags:

ruby

web-crawler

Ben

2 Answers

Wayne Conrad

psyho

Recent Activity

Donate For Us

Why should Ruby not be used to create a spider

Tags:

ruby

web-crawler

Ben

2 Answers

Wayne Conrad

psyho

Related questions

Recent Activity

Donate For Us