I'm working on a data scraping project and my code uses Scrapy (version 1.0.4) and Selenium (version 2.47.1).
from scrapy import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.spiders import CrawlSpider
from selenium import webdriver
class TradesySpider(CrawlSpider):
name = 'tradesy'
start_urls = ['My Start url',]
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
while True:
tradesy_urls = Selector(response).xpath('//div[@id="right-panel"]"]')
data_urls = tradesy_urls.xpath('div[@class="item streamline"]/a/@href').extract()
for link in data_urls:
url = 'My base url'+link
yield Request(url=url,callback=self.parse_data)
time.sleep(10)
try:
data_path = self.driver.find_element_by_xpath('//*[@id="page-next"]')
except:
break
data_path.click()
time.sleep(10)
def parse_data(self,response):
'Scrapy Operations...'
When I execute my code, I'm getting expected output for some urls but for others I'm getting the following error.
2016-01-19 15:45:17 [scrapy] DEBUG: Retrying <GET MY_URL> (failed 1 times): [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'SSL3_READ_BYTES', 'ssl handshake failure')]>]
Please provide a solution for this query.
Using Scrapy 1.5.0 I was running into this error:
Error downloading: https://my.website.com>: [<twisted.python.failure.Failure OpenSSL.SSL.Error: [('SSL routines', 'tls12_check_peer_sigalg', 'wrong curve')]>]
What ended up working was updating my version of Twisted (from 17.9.0 -> 19.10.0). I also updated Scrapy to 2.4.0, and a few others:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With