Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python requests with HTTPAdapter is halting for hours

I have a special URL where my code is halting for hours (more than 3 hours). I can't seem to understand why it would do that.

The URL is http://www.etudes.ccip.fr/maintenance_site.php.

Direct requests.get() works instantaneously but whenever I have an HTTPAdapter, the code seems to sleep almost indefinitely

import requests
from requests.adapters import HTTPAdapter    

url = 'http://www.etudes.ccip.fr/maintenance_site.php'
session = requests.Session()
session.mount('http://', HTTPAdapter(max_retries=2))
session.get(url, timeout=2)
like image 660
fast_cen Avatar asked Nov 20 '17 17:11

fast_cen


People also ask

What is default timeout for requests Python?

By default, requests do not have a timeout unless you explicitly specify one. It is recommended to set a timeout for nearly all requests; otherwise, your code may freeze, and your program will become non-responsive.

What is response 200 Python?

A status code informs you of the status of the request. For example, a 200 OK status means that your request was successful, whereas a 404 NOT FOUND status means that the resource you were looking for was not found.

How do you retry a request in Python?

The raise_on_status keyword argument appears to have made it into the standard library at most in python version 3.6. To make requests retry on specific HTTP status codes, use status_forcelist. For example, status_forcelist=[503] will retry on status code 503 (service unavailable).


2 Answers

The adapter that you initialize set the Retry using below

    if max_retries == DEFAULT_RETRIES:
        self.max_retries = Retry(0, read=False)
    else:
        self.max_retries = Retry.from_int(max_retries)

And if you look at the initialization

def __init__(self, total=10, connect=None, read=None, redirect=None, status=None,
             method_whitelist=DEFAULT_METHOD_WHITELIST, status_forcelist=None,
             backoff_factor=0, raise_on_redirect=True, raise_on_status=True,
             history=None, respect_retry_after_header=True):

The default value of respect_retry_after_header is True. You need this False in your case. If you inspect the response using curl

$ curl -I http://www.etudes.ccip.fr/maintenance_site.php
HTTP/1.1 503 Service Temporarily Unavailable
Date: Thu, 23 Nov 2017 14:15:49 GMT
Server: Apache
Status: 503 Service Temporarily Unavailable
Retry-After: 3600
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Cache-Control: pre-check=0, post-check=0, max-age=0
Pragma: no-cache
Connection: close
Content-Type: text/html; charset=ISO-8859-1

You want the respect_retry_after_header to be set to False. This can be done by creating the adapter and then modifying this behavior

import requests
from requests.adapters import HTTPAdapter

url = 'http://www.etudes.ccip.fr/maintenance_site.php'
session = requests.Session()

adapter = HTTPAdapter(max_retries=2)
adapter.max_retries.respect_retry_after_header = False

session.mount('http://', adapter)

session.get(url, timeout=2)
like image 75
Tarun Lalwani Avatar answered Sep 28 '22 22:09

Tarun Lalwani


The retry-after header in the response is the problem which sets the connection to sleep for 3600 seconds. See retry.py in urllib3.

    def sleep(self, response=None):
    """ Sleep between retry attempts.

    This method will respect a server's ``Retry-After`` response header
    and sleep the duration of the time requested. If that is not present, it
    will use an exponential backoff. By default, the backoff factor is 0 and
    this method will return immediately.
    """

    if response:
        slept = self.sleep_for_retry(response)
        if slept:
            return

    self._sleep_backoff()

The solution is to set max_retries=0. This avoids 2x3600 seconds waiting until your application ends.

like image 38
TheCurl Avatar answered Sep 28 '22 22:09

TheCurl