Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where shall I start in making a scraper or a bot using python? [closed]

Tags:

python

cgi

I'm not that new in programming languages(python) but I got no clue on where will I start in making a bot or a scraper using python?. should I study in cgi programming? or does the scraper runs just using a python script? Should I build a server for that? Got no clue for this... thanks for the help

like image 402
Kyle Avatar asked Jun 19 '10 14:06

Kyle


3 Answers

Here are some links to get you started.

  • Build a basic web scraper in Python
  • Scrapy: An open source web scraping framework for Python
  • Web scraping with Python. Part 1: Crawling
like image 101
Brian Clapper Avatar answered Sep 20 '22 08:09

Brian Clapper


If you’re trying to access websites that make heavy use of JavaScript, you might, overall, find Selenium easier.

Selenium is a server that controls actual web browsers on your server, and a client library (including a Python port) that allows you to control the browsers and inspect the pages in them.

It’s definitely more overhead up-front to configure (and figure out) the server and client library (and to make sure you have a working browser on your system), but if the website does a lot of stuff in JavaScript, your actual scraping code could be a lot less hairy.

like image 38
Paul D. Waite Avatar answered Sep 21 '22 08:09

Paul D. Waite


Screen scraping involves a lot of regular expressions to get the exact data you want. You also want to know what sort of data you want to analyze and how you want to store it.

To get the pages, you'll need to utilize libraries such as urllib (or urllib2) and regular expressions (re) or a good script to use is beautifulsoup to do your dirty work (http://www.crummy.com/software/BeautifulSoup/)

If you want to build a pure bot that does what the search engines do, you also have to build a smart enough bot to know that you don't keep pinging the same domain continuously (results in a DOS attack).

like image 41
Duniyadnd Avatar answered Sep 21 '22 08:09

Duniyadnd