Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'NoneType' object has no attribute '_app_data' in scrapy\twisted\openssl

During the scraping process using scrapy one error appears in my logs from time to time. It doesnt seem to be anywhere in my code, and looks like it something inside twisted\openssl. Any ideas what caused this and how to get rid of it?

Stacktrace here:

[Launcher,27487/stderr] Error during info_callback
    Traceback (most recent call last):
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/protocols/tls.py", line 415, in dataReceived
        self._write(bytes)
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/protocols/tls.py", line 554, in _write
        sent = self._tlsConnection.send(toSend)
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1270, in send
        result = _lib.SSL_write(self._ssl, buf, len(buf))
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 926, in wrapper
        callback(Connection._reverse_mapping[ssl], where, return_code)
    --- <exception caught here> ---
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1055, in infoCallback
        return wrapped(connection, where, ret)
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1157, in _identityVerifyingInfoCallback
        transport = connection.get_app_data()
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1589, in get_app_data
        return self._app_data
      File "/opt/webapps/link_crawler/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1148, in __getattr__
        return getattr(self._socket, name)
    exceptions.AttributeError: 'NoneType' object has no attribute '_app_data'
like image 857
Aldarund Avatar asked May 12 '15 22:05

Aldarund


2 Answers

I was able to solve this problem by installing the service_identity package:

pip install service_identity

like image 65
geca Avatar answered Oct 14 '22 11:10

geca


At first glance, it appears as though this is due to a bug in scrapy. Scrapy defines its own Twisted "context factory": https://github.com/scrapy/scrapy/blob/ad36de4e6278cf635509a1ade30cca9a506da682/scrapy/core/downloader/contextfactory.py#L21-L28

This code instantiates ClientTLSOptions with the context it intends to return. A side-effect of instantiating this class is that an "info callback" is installed on the context factory. The info callback requires that the Twisted TLS implementation has been set as "app data" on the connection. However, since nothing ever uses the ClientTLSOptions instance (it is discarded immediately), the app data is never set.

When the info callback comes back around to get the Twisted TLS implementation (necessary to do part of its job) it instead finds there is no app data and fails with the exception you've reported.

The side-effect of ClientTLSOptions is a little bit unpleasant but I think this is clearly a scrapy bug caused by mis-use/abuse of ClientTLSOptions. I don't think this code could ever have been very well tested since this error will happen every single time a certificate fails to verify.

I suggest reporting the bug to Scrapy. Hopefully they can fix their use of ClientTLSOptions and eliminate this error for you.

like image 25
Jean-Paul Calderone Avatar answered Oct 14 '22 12:10

Jean-Paul Calderone