Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web scraping: Automating button click [closed]

I am trying to scrape data off a website using Scrapy, a python framework. I can get the data from the website using the spiders but the problem occurs when I try to navigate through the website.

According to this post Scrapy does not handle Javascript well.

Also, as stated in the accepted answer, I cannot use mechanize or lxml. It suggests using a combination of Selenium and Scrapy.

Function of the button:

I am browsing through offers on a website. The function of the button is to show more offers. SO on clicking it, it calls a javascript function which loads the results.

I also was looking at CasperJS and PhantomJS. Will they work?

I just need to automate the clicking of a button. How do I go about this?

like image 249
praxmon Avatar asked Jan 07 '15 05:01

praxmon


Video Answer


1 Answers

First of all, yes - you can use PhantomJS ghostdriver with python. It is built-in to python-selenium:

pip install selenium

Demo:

>>> from selenium import webdriver
>>> driver = webdriver.PhantomJS()
>>> driver.get('https://stackoverflow.com/questions/27813251')
>>> driver.title
u'javascript - Web scraping: Automating button click - Stack Overflow'

There are also several other threads that provide examples of "scrapy+selenium" spiders:

  • selenium with scrapy for dynamic page
  • Scraping with Scrapy and Selenium
  • seleniumcrawler

Also there is a scrapy-webdriver module that can probably help with it too.


Using scrapy with selenium would give you a huge overhead and slow things down drammatically even with a headless PhantomJS browser.

There is a huge chance you can mimic that "show more offers" button click by simulating the underlying request going to get the data you need. Use browser developer tools to explore what kind of request is fired and use scrapy.http.Request for simulation inside the spider.

like image 130
alecxe Avatar answered Oct 03 '22 15:10

alecxe