Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tweepy 2.0 hanging on ssl do_handshake()

I have a long running Twitter scraping script that occasionally hangs with stack traces ending in

"/usr/lib/python2.7/ssl.py", line 305: self._sslobj.do_handshake()

My question is "why?" and "what can I do to fix it?".

It'll typically last a week or so before this happens. This last time, 3 of the 4 threads that deal with tweepy hung (the fourth was waiting for info from a hung thread). Strangely, there was quite a long delay between the threads hanging: firstly, the thread calling api.followers_ids() hung, then about 12 minutes later the thread calling api.friends_ids() hung, then 1 hour 12 minutes later (!) the thread calling api.search() hung. There were many api calls in between all of these.

I have a little code in there to dump it's stack traces when I send a QUIT signal, and I got something like the following for the hung threads. They are all identical from (and including) the second entry (the tweepy/binder.py, line 185, in _call part). The other two got there from tweepy/cursor.py, line 85 in next and tweepy/cursor.py, line 60, in next:

  File "myTwitterScrapingScript.py", line 245, in checkStatus
    status = api.rate_limit_status()
  File "/scratch/bin/python-virtual-environments/tweepy-2/local/lib/python2.7/site-packages/tweepy/binder.py", line 185, in _call
    return method.execute()
  File "/scratch/bin/python-virtual-environments/tweepy-2/local/lib/python2.7/site-packages/tweepy/binder.py", line 146, in execute
    conn.request(self.method, url, headers=self.headers, body=self.post_data)
  File "/usr/lib/python2.7/httplib.py", line 958, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
    self.send(msg)  File "/scratch/bin/python-virtual-environments/tweepy-2/local/lib/python2.7/site-packages/tweepy/binder.py", line 185, in _call
    return method.execute()
  File "/scratch/bin/python-virtual-environments/tweepy-2/local/lib/python2.7/site-packages/tweepy/binder.py", line 146, in execute
    conn.request(self.method, url, headers=self.headers, body=self.post_data)
  File "/usr/lib/python2.7/httplib.py", line 958, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 776, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 1161, in connect
    self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
  File "/usr/lib/python2.7/ssl.py", line 381, in wrap_socket
    ciphers=ciphers)
  File "/usr/lib/python2.7/ssl.py", line 143, in __init__
    self.do_handshake()
  File "/usr/lib/python2.7/ssl.py", line 305, in do_handshake
    self._sslobj.do_handshake()

There were a few tweepy errors around the times the threads hung. That's not too unusual, though the number is slightly more than normal. The fourth one looks interesting though.. it happened immediately before that thread hung.

  • [Errno 110] : Connection timed out about 7 mimutes before the last followers_ids() call (with many assorted api calls in between)
  • [Errno 104] Connection reset by peer about 3 minutes after (again, several successful calls between)
  • [Errno 110] Connection timed out about 1.5 minutes before the last friends_ids() call. This was in the api.search() thread, which had been waiting since about 5 minutes before the first threaad hung - a total wait of about 15 minutes.
  • [Errno 104] Connection reset by peer about 2 milliseconds before the last news from the friends_ids() thread, and was in the same thread. The pages of friends ids just collected appear all be ok and there wasn't an error from those calls.
  • [Errno 104] Connection reset by peer in the search thread, about 17 minutes after the friends_ids thread hung and nearly an hour before the search thread hung.
  • A Failed to send request TweepError with no reason about 1.5 minutes later.
  • 3 more reason-less Failed to send request's and a [Errno 104] Connection reset by peer over the next 45 minutes.
  • About 15 error-free minutes with lots of search and lookup_users calls before the search thread finally hung.
like image 925
drevicko Avatar asked Nov 13 '22 06:11

drevicko


1 Answers

The problem seems to have been with tweepy not implementing a timeout. In recent versions of tweepy, this has been fixed and this problem has not occurred since (in several months of continuous data collection).

like image 188
drevicko Avatar answered Nov 14 '22 19:11

drevicko