I have a long running Twitter scraping script that occasionally hangs with stack traces ending in
"/usr/lib/python2.7/ssl.py", line 305: self._sslobj.do_handshake()
My question is "why?" and "what can I do to fix it?".
It'll typically last a week or so before this happens. This last time, 3 of the 4 threads that deal with tweepy
hung (the fourth was waiting for info from a hung thread). Strangely, there was quite a long delay between the threads hanging: firstly, the thread calling api.followers_ids()
hung, then about 12 minutes later the thread calling api.friends_ids()
hung, then 1 hour 12 minutes later (!) the thread calling api.search()
hung. There were many api calls in between all of these.
I have a little code in there to dump it's stack traces when I send a QUIT
signal, and I got something like the following for the hung threads. They are all identical from (and including) the second entry (the tweepy/binder.py, line 185, in _call
part). The other two got there from tweepy/cursor.py, line 85 in next
and tweepy/cursor.py, line 60, in next
:
File "myTwitterScrapingScript.py", line 245, in checkStatus
status = api.rate_limit_status()
File "/scratch/bin/python-virtual-environments/tweepy-2/local/lib/python2.7/site-packages/tweepy/binder.py", line 185, in _call
return method.execute()
File "/scratch/bin/python-virtual-environments/tweepy-2/local/lib/python2.7/site-packages/tweepy/binder.py", line 146, in execute
conn.request(self.method, url, headers=self.headers, body=self.post_data)
File "/usr/lib/python2.7/httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg) File "/scratch/bin/python-virtual-environments/tweepy-2/local/lib/python2.7/site-packages/tweepy/binder.py", line 185, in _call
return method.execute()
File "/scratch/bin/python-virtual-environments/tweepy-2/local/lib/python2.7/site-packages/tweepy/binder.py", line 146, in execute
conn.request(self.method, url, headers=self.headers, body=self.post_data)
File "/usr/lib/python2.7/httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 776, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 1161, in connect
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
File "/usr/lib/python2.7/ssl.py", line 381, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 143, in __init__
self.do_handshake()
File "/usr/lib/python2.7/ssl.py", line 305, in do_handshake
self._sslobj.do_handshake()
There were a few tweepy errors around the times the threads hung. That's not too unusual, though the number is slightly more than normal. The fourth one looks interesting though.. it happened immediately before that thread hung.
[Errno 110] : Connection timed out
about 7 mimutes before the last followers_ids()
call (with many assorted api calls in between)[Errno 104] Connection reset by peer
about 3 minutes after (again, several successful calls between)[Errno 110] Connection timed out
about 1.5 minutes before the last friends_ids()
call. This was in the api.search()
thread, which had been waiting since about 5 minutes before the first threaad hung - a total wait of about 15 minutes. [Errno 104] Connection reset by peer
about 2 milliseconds before the last news from the friends_ids()
thread, and was in the same thread. The pages of friends ids just collected appear all be ok and there wasn't an error from those calls.[Errno 104] Connection reset by peer
in the search
thread, about 17 minutes after the friends_ids
thread hung and nearly an hour before the search
thread hung.Failed to send request
TweepError with no reason about 1.5 minutes later.Failed to send request
's and a [Errno 104] Connection reset by peer
over the next 45 minutes.search
and lookup_users
calls before the search
thread finally hung.The problem seems to have been with tweepy not implementing a timeout. In recent versions of tweepy, this has been fixed and this problem has not occurred since (in several months of continuous data collection).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With