I am trying to scrape data off a website using Scrapy, a python framework. I can get the data from the website using the spiders but the problem occurs when I try to navigate through the website.
According to this post Scrapy does not handle Javascript well.
Also, as stated in the accepted answer, I cannot use mechanize or lxml. It suggests using a combination of Selenium and Scrapy
.
Function of the button:
I am browsing through offers on a website. The function of the button is to show more offers. SO on clicking it, it calls a javascript function which loads the results.
I also was looking at CasperJS and PhantomJS
. Will they work?
I just need to automate the clicking of a button. How do I go about this?
First of all, yes - you can use PhantomJS
ghostdriver with python. It is built-in to python-selenium
:
pip install selenium
Demo:
>>> from selenium import webdriver
>>> driver = webdriver.PhantomJS()
>>> driver.get('https://stackoverflow.com/questions/27813251')
>>> driver.title
u'javascript - Web scraping: Automating button click - Stack Overflow'
There are also several other threads that provide examples of "scrapy+selenium" spiders:
Also there is a scrapy-webdriver
module that can probably help with it too.
Using scrapy with selenium would give you a huge overhead and slow things down drammatically even with a headless PhantomJS
browser.
There is a huge chance you can mimic that "show more offers" button click by simulating the underlying request going to get the data you need. Use browser developer tools to explore what kind of request is fired and use scrapy.http.Request
for simulation inside the spider.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With