Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I write a web scraper in Ruby?

I would like to crawl a popular site (say Quora) that doesn't have an API and get some specific information and dump it into a file - say either a csv, .txt, or .html formatted nicely :)

E.g. return only a list of all the 'Bios' of the Users of Quora that have, listed in their publicly available information, the occupation 'UX designer'.

How would I do that in Ruby ?

I have a moderate enough level of understanding of how Ruby & Rails work. I just completed a Rails app - mainly all written by myself. But I am no guru by any stretch of the imagination.

I understand RegExs, etc.

like image 756
marcamillion Avatar asked May 10 '11 08:05

marcamillion


People also ask

What is web scraping in Ruby?

Web Scraping is used to extract useful data from websites. This extracted data can be used in many applications. Web Scraping is mainly useful in gathering data while there is no other means to collect data — eg API or feeds. Creating a Web Scraping Application using Ruby on Rails is pretty easy.

What language are web scrapers written?

Python. Python is mostly known as the best web scraper language. It's more like an all-rounder and can handle most of the web crawling-related processes smoothly.

Which language is best for scraping?

Just like PHP, Python is a popular and best programming language for web scraping. As a Python expert, you can handle multiple data crawling or web scraping tasks comfortably and don't need to learn sophisticated codes. Requests, Scrappy and BeautifulSoup, are the three most famous and widely used Python frameworks.


1 Answers

Your best bet would be to use Mechanize.It can follow links, submit forms, anything you will need, web client-wise. By the way, don't use regexes to parse HTML. Use an HTML parser.

like image 176
Geo Avatar answered Oct 07 '22 19:10

Geo