Crawling SSL site with scrapy

Tags:

I've to crawl https://dms.psc.sc.gov/Web/dockets which uses TLS v1.2 using scrapy framework. But in requesting the URL it fails to load and raise [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>].

There is issue discussed on git https://github.com/scrapy/scrapy/issues/981 but it did not work for me. I have scrapy v 0.24.5 and twisted version >=14.

When I try to crawl another site which also uses TLS v1.2 it works but not for the https://dms.psc.sc.gov. How to solve this issue?

306

asked Jun 24 '15 13:06

Hassan Raza

1 Answers

PR fixing this problem in Scrapy was already merged. Recently (in February 2016) there was another pull request fixing similar bug

I see with most recent Scrapy version I can fetch your page all right, but with older versions problem still appears.

In general, if you stumble on HTTP-s problem with Scrapy the solution is:

upgrade Scrapy to newest version
check what version of Twisted you use, if it's not most recent update to most recent Twisted version (as of time of writing versions above 14 are confirmed to be significantly better when it comes to SSL)

If you still experience problems after updating Scrapy and Twisted you may need to subclass ScrapyClientContextFactory - see answer below for details.

More details in this github issue

answered Sep 30 '22 03:09

Pawel Miech

Related questions
                            
                                Normalized Mutual Information by Scikit Learn giving me wrong value
                            
                                Python & Algorithm: How to do simple geometry shape match?
                            
                                How can I get Flask optional URL parameters in a decorator?
                            
                                Python coroutines on builtin functions
                            
                                How to read a big tif file in python?
                            
                                python - mutually exclusive arguments complains about action index
                            
                                Executing Javascript on Selenium/PhantomJS
                            
                                Creating a default ID in Django from a function and prefix string
                            
                                Inheriting from classes unpacked from a list
                            
                                Python scipy fsolve "mismatch between the input and output shape of the 'func' argument"
                            
                                Excel worksheet to Numpy array
                            
                                Writing into existing excel file using python
                            
                                Displaying a temporary html file with webbrowser in Python
                            
                                Extracting specific src attributes from script tags
                            
                                Repeat Pandas dataframe row labels
                            
                                "failed with error code 1" while installing scipy
                            
                                How can I create a PNG image file from a list of pixel values in Python?
                            
                                read specific line in csv file , python
                            
                                Reddit search API not giving all results
                            
                                Code style - for with if

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Crawling SSL site with scrapy

Tags:

python

ssl

scrapy

Hassan Raza

People also ask

1 Answers

Pawel Miech

Recent Activity

Donate For Us