Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Requests hanging/freezing

I'm using the requests library to get a lot of webpages from somewhere. He's the pertinent code:

response = requests.Session()
retries = Retry(total=5, backoff_factor=.1)
response.mount('http://', HTTPAdapter(max_retries=retries))
response = response.get(url)

After a while it just hangs/freezes (never on the same webpage) while getting the page. Here's the traceback when I interrupt it:

File "/Users/Student/Hockey/Scrape/html_pbp.py", line 21, in get_pbp
  response = r.read().decode('utf-8')
File "/anaconda/lib/python3.6/http/client.py", line 456, in read
  return self._readall_chunked()
File "/anaconda/lib/python3.6/http/client.py", line 566, in _readall_chunked
  value.append(self._safe_read(chunk_left))
File "/anaconda/lib/python3.6/http/client.py", line 612, in _safe_read
  chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/anaconda/lib/python3.6/socket.py", line 586, in readinto
  return self._sock.recv_into(b)
KeyboardInterrupt

Does anybody know what could be causing it? Or (more importantly) does anybody know a way to stop it if it takes more than a certain amount of time so that I could try again?

like image 842
Hobbit36 Avatar asked Jul 23 '17 15:07

Hobbit36


3 Answers

Seems like setting a (read) timeout might help you.

Something along the lines of:

response = response.get(url, timeout=5)

(This will set both connect and read timeout to 5 seconds.)

In requests, unfortunately, neither connect nor read timeouts are set by default, even though the docs say it's good to set it:

Most requests to external servers should have a timeout attached, in case the server is not responding in a timely manner. By default, requests do not time out unless a timeout value is set explicitly. Without a timeout, your code may hang for minutes or more.

Just for completeness, the connect timeout is the number of seconds requests will wait for your client to establish a connection to a remote machine, and the read timeout is the number of seconds the client will wait between bytes sent from the server.

like image 139
randomir Avatar answered Nov 20 '22 16:11

randomir


Patching the documented "send" function will fix this for all requests - even in many dependent libraries and sdk's. When patching libs, be sure to patch supported/documented functions, otherwise you may wind up silently losing the effect of your patch.

import requests

DEFAULT_TIMEOUT = 180

old_send = requests.Session.send

def new_send(*args, **kwargs):
     if kwargs.get("timeout", None) is None:
         kwargs["timeout"] = DEFAULT_TIMEOUT
     return old_send(*args, **kwargs)

requests.Session.send = new_send

The effects of not having any timeout are quite severe, and the use of a default timeout can almost never break anything - because TCP itself has timeouts as well.

On Windows the default TCP timeout is 240 seconds, TCP RFC recommend a minimum of 100 seconds for RTO*retry. Somewhere in that range is a safe default.

like image 39
Erik Aronesty Avatar answered Nov 20 '22 17:11

Erik Aronesty


To set timeout globally instead of specifying in every request:


from requests.adapters import TimeoutSauce

REQUESTS_TIMEOUT_SECONDS = float(os.getenv("REQUESTS_TIMEOUT_SECONDS", 5))

class CustomTimeout(TimeoutSauce):
    def __init__(self, *args, **kwargs):
        if kwargs["connect"] is None:
            kwargs["connect"] = REQUESTS_TIMEOUT_SECONDS
        if kwargs["read"] is None:
            kwargs["read"] = REQUESTS_TIMEOUT_SECONDS
        super().__init__(*args, **kwargs)


# Set it globally, instead of specifying ``timeout=..`` kwarg on each call.
requests.adapters.TimeoutSauce = CustomTimeout


sess = requests.Session()
sess.get(...)
sess.post(...)
like image 1
Danila Vershinin Avatar answered Nov 20 '22 16:11

Danila Vershinin