Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange blocking behaviour with gevent/grequests over HTTPS

The following code sends a request every 200ms and should handle the responses asynchronously whenever they come.

Over HTTP it works as expected - a request is sent every 200ms and the response callback is called independently whenever a response arrives. Over HTTPS however, the requests get significantly delayed whenever a response arrives (even though my response handler does no work). The response callback seems to be called twice for each request, once with a zero-length response (edit: this is because of a redirect and seems unrelated to the blocking issue, thanks Padraic).

What could be causing this blocking behaviour over HTTPS? (www.bbc.co.uk is just an example that's geographically far from me, but it happens with all servers I've tested).

grequests_test.py

import time
import sys
import grequests
import gevent

def cb(res, **kwargs):
    print("**** Response", time.time(), len(res.text))

for i in range(10):
    unsent = grequests.get(sys.argv[1], hooks={'response': cb})
    print("Request", time.time())
    grequests.send(unsent, grequests.Pool(1))
    gevent.sleep(0.2)
gevent.sleep(5)

$ ipython2 grequests_test.py 'http://www.bbc.co.uk' (expected result)

('Request', 1459050191.499266)
('Request', 1459050191.701466)
('Request', 1459050191.903223)
('Request', 1459050192.10403)
('Request', 1459050192.305626)
('**** Response', 1459050192.099185, 179643)
('Request', 1459050192.506476)
('**** Response', 1459050192.307869, 179643)
('Request', 1459050192.707745)
('**** Response', 1459050192.484711, 179643)
('Request', 1459050192.909376)
('**** Response', 1459050192.696583, 179643)
('Request', 1459050193.110528)
('**** Response', 1459050192.870476, 179643)
('Request', 1459050193.311601)
('**** Response', 1459050193.071679, 179639)
('**** Response', 1459050193.313615, 179680)
('**** Response', 1459050193.4959, 179643)
('**** Response', 1459050193.687054, 179680)
('**** Response', 1459050193.902827, 179639)

ipython2 grequests_test.py 'https://www.bbc.co.uk' (requests are sent late)

('Request', 1459050203.24336)
('Request', 1459050203.44473)
('**** Response', 1459050204.423302, 0)
('Request', 1459050204.424748) <------------- THIS REQUEST TIME IS LATE
('**** Response', 1459050205.294426, 0)
('Request', 1459050205.296722)
('Request', 1459050205.497924)
('**** Response', 1459050206.456572, 0)
('Request', 1459050206.457875)
('**** Response', 1459050207.363188, 0)
('**** Response', 1459050208.247189, 0)
('Request', 1459050208.249579)
('**** Response', 1459050208.250645, 179643)
('**** Response', 1459050208.253638, 179643)
('Request', 1459050208.451083)
('**** Response', 1459050209.426556, 0)
('Request', 1459050209.428032)
('**** Response', 1459050209.428929, 179643)
('**** Response', 1459050210.331425, 0)
('**** Response', 1459050211.247793, 0)
('Request', 1459050211.251574)
('**** Response', 1459050211.252321, 179643)
('**** Response', 1459050211.25519, 179680)
('**** Response', 1459050212.397186, 0)
('**** Response', 1459050213.299109, 0)
('**** Response', 1459050213.588854, 179588)
('**** Response', 1459050213.590434, 179643)
('**** Response', 1459050213.593731, 179643)
('**** Response', 1459050213.90507, 179643)
('**** Response', 1459050213.909386, 179643)

Notice that the first response appears to arrive long after the next request should have been sent but wasn't. Why didn't the sleep return, and the next request get sent, before that first response arrived?

like image 429
akxlr Avatar asked Mar 27 '16 04:03

akxlr


2 Answers

The extra responses and the 0 length responses are easily explained, if you add a print(res.status_code) you will see a lot of 301's as in the case of https://www.bbc.co.uk you get redirected to http://www.bbc.co.uk so that is why you see extras responses and 0 returned for the len(res.text), you can see the output below:

In [11]: def cb(res, **kwargs):
   ....:         print(res.status_code)
   ....:         print("**** Response", time.time(), len(res.text))
   ....:     

In [12]: for i in range(10):
   ....:         unsent = grequests.get("https://www.bbc.co.uk", hooks={'response': cb})
   ....:         print("Request", time.time())
   ....:         grequests.send(unsent, grequests.Pool(1))
   ....:         gevent.sleep(0.2)
   ....: gevent.sleep(5)
   ....: 
('Request', 1459368704.32843)
301
('**** Response', 1459368704.616453, 0)
('Request', 1459368704.618786)
301
('**** Response', 1459368704.937159, 0)
('Request', 1459368704.941069)
200
('**** Response', 1459368704.943034, 141486)
301
('**** Response', 1459368705.496423, 0)
('Request', 1459368705.498991)
200
('**** Response', 1459368705.50162, 141448)
301
('**** Response', 1459368705.784145, 0)
('Request', 1459368705.785769)
200
('**** Response', 1459368705.786772, 141486)
301
('**** Response', 1459368706.110865, 0)
('Request', 1459368706.114921)
200
('**** Response', 1459368706.116124, 141448)
301
('**** Response', 1459368706.396807, 0)
('Request', 1459368706.400795)
200
301
('**** Response', 1459368706.756861, 0)
('Request', 1459368706.76069)
200
('**** Response', 1459368706.763268, 141448)
('**** Response', 1459368706.488708, 141448)
301
('**** Response', 1459368707.065011, 0)
('Request', 1459368707.069128)
200
('**** Response', 1459368707.071981, 141448)
301
('**** Response', 1459368707.366737, 0)
('Request', 1459368707.370713)
200
('**** Response', 1459368707.373597, 141448)
301
('**** Response', 1459368707.73689, 0)
200
('**** Response', 1459368707.743815, 141448)
200
('**** Response', 1459368707.902499, 141448)

If we run the same code using a site that serves over https, https://www.google.ie/ in this example:

In [14]: for i in range(10):
   ....:         unsent = grequests.get("https://www.google.ie/", hooks={'response': cb})
   ....:         print("Request", time.time())
   ....:         grequests.send(unsent, grequests.Pool(1))
   ....:         gevent.sleep(0.2)
   ....: gevent.sleep(5)
   ....: 
('Request', 1459368895.525717)
200
('**** Response', 1459368895.838453, 19682)
('Request', 1459368895.884151)
200
('**** Response', 1459368896.168789, 19650)
('Request', 1459368896.22553)
200
('**** Response', 1459368896.491304, 19632)
('Request', 1459368896.542206)
200
('**** Response', 1459368896.808875, 19650)
('Request', 1459368896.850575)
200
('**** Response', 1459368897.144725, 19705)
('Request', 1459368897.173744)
200
('**** Response', 1459368897.45713, 19649)
('Request', 1459368897.491821)
200
('**** Response', 1459368897.761675, 19657)
('Request', 1459368897.792373)
200
('**** Response', 1459368898.331791, 19683)
('Request', 1459368898.350483)
200
('**** Response', 1459368898.836108, 19713)
('Request', 1459368898.855729)
200
('**** Response', 1459368899.148171, 19666)

You will see the behaviour is different. We get 10 responses and no 0 length responses. You should check the status_code in your function to verify you get what you want. The example above explains what you see for the bbc site and most likely what is happening for the others.

like image 63
Padraic Cunningham Avatar answered Nov 13 '22 11:11

Padraic Cunningham


The current iteration of grequests contains the following:

from gevent import monkey as curious_george
curious_george.patch_all(thread=False, select=False)

The offending part is select=False - removing this or manually calling monkey.patch_select() solves the problem. I am not sure if this has other side effects.

like image 36
akxlr Avatar answered Nov 13 '22 10:11

akxlr