I am trying to crawl data from a list of URLs. I have already done with the code below and succeeded yesterday without any error.
But today, when I came back and ran the code again, there was an error raised: the 'EPollReactor' object has no attribute '_handleSignals' The detailed error
Below is my code:
class MySpider(scrapy.Spider):
name = 'myspider'
def start_requests(self):
urls = df['Link']
for index, url in enumerate(urls):
yield scrapy.Request(url=url, meta={'Index':index,'Item': ''})
def parse(self, response):
Item = response.meta['Item']
Index = response.meta['Index']
content = ''
for para in response.css('p::text').extract():
Item = Item + para
df.loc[Index,"Content"] = Item
process = CrawlerProcess()
process.crawl(MySpider)
process.start()
I searched but I don't really understand fully about this so I can not fix the error. Could you please help me to fix it?
Thanks
Did you reinstall scrapy? I was having the same issue today - my code that worked previously was giving the error you described. It looks like the error has to do with one of scrapy's dependencies, the Twisted package. There was a new release of the Twisted package about 4 hours ago (Version 23.8.0) that seems to have some compatibility issues with scrapy. If you pip install scrapy and allow Twisted to be installed as a dependency, it will install the new version and throw this error. I solved it by doing
pip install Twisted==22.10.0
to install the previous release of Twisted and it solved my problems.
Scrapy v. 2.10.1 - released (release notes) with the only following change aimed to fix this:
Marked Twisted >= 23.8.0 as unsupported. (issue 6024, issue 6026)
So updating scrapy to is's latest version (for now it's 2.10.1) should solve this for now.
If project require older version of scrapy - set twisted to older version as suggested in other answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With