Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding requests versus grequests

I'm working with a process which is basically as follows:

  1. Take some list of urls.
  2. Get a Response object from each.
  3. Create a BeautifulSoup object from the text of each Response.
  4. Pull the text of a certain tag from that BeautifulSoup object.

From my understanding, this seems ideal for grequests:

GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.

But yet, the two processes (one with requests, one with grequests) seem to be getting me different results, with some of the requests in grequests returning None rather than a response.

Using requests

import requests

tickers = [
    'A', 'AAL', 'AAP', 'AAPL', 'ABBV', 'ABC', 'ABT', 'ACN', 'ADBE', 'ADI', 
    'ADM',  'ADP', 'ADS', 'ADSK', 'AEE', 'AEP', 'AES', 'AET', 'AFL', 'AGN', 
    'AIG', 'AIV', 'AIZ', 'AJG', 'AKAM', 'ALB', 'ALGN', 'ALK', 'ALL', 'ALLE',
    ]

BASE = 'https://finance.google.com/finance?q={}'

rs = (requests.get(u) for u in [BASE.format(t) for t in tickers])
rs = list(rs)

rs
# [<Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # ...
 # <Response [200]>]

# All are okay (status_code == 200)

Using grequests

# Restarted my interpreter and redefined `tickers` and `BASE`
import grequests

rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rs = grequests.map(rs)

rs
# [None,
 # <Response [200]>,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # None,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>,
 # <Response [200]>]

Why the difference in results?

Update: I can print the exception type as follows. Related discussion here but I have no idea what's going on.

def exception_handler(request, exception):
    print(exception)

rs = grequests.map(rs, exception_handler=exception_handler)

# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
# ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)

System/version info

  • requests: 2.18.4
  • grequests: 0.3.0
  • Python: 3.6.3
  • urllib3: 1.22
  • pyopenssl: 17.2.0
  • All via Anaconda
  • System: same issue on both Mac OSX HS & Windows 10, build 10.0.16299
like image 877
Brad Solomon Avatar asked Sep 13 '17 19:09

Brad Solomon


2 Answers

You are just sending requests too fast. As grequests is an async lib, all of these requests are almost sent simultaneously. They are too many.

You just need to limit the concurrent tasks by grequests.map(rs, size=your_choice), I have tested grequests.map(rs, size=10) and it works well.

like image 79
Sraw Avatar answered Oct 12 '22 22:10

Sraw


I do not know the exact reason for the observed behavior with .map(). However, using the .imap() function with size=1 always returned a 'Response 200' for my few minutes testing. Here is the code snipet:

rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rsm_iterator = grequests.imap(rs, exception_handler=exception_handler, size=1)
rsm_list = [r for r in rsm_iterator]
print(rsm_list)

And if you don't want to wait for all requests to finish before working on their answers, you can do this like so:

rs = (grequests.get(u) for u in [BASE.format(t) for t in tickers])
rsm_iterator = grequests.imap(rs, exception_handler=exception_handler, size=1)
for r in rsm_iterator:
    print(r)
like image 45
fabianegli Avatar answered Oct 12 '22 23:10

fabianegli