<p>When requesting a web resource or website or web service with the requests library, the request takes a long time to complete. The code looks similar to the following:</p> <pre class="prettyprint lang-py prettyprint-override"><code>import requests requests.get("https://www.example.com/") </code></pre> <p>This request takes over 2 minutes (exactly 2 minutes 10 seconds) to complete! Why is it so slow and how can I fix it?</p>

<p>There can be multiple possible solutions to this problem. There are a multitude of answers on StackOverflow for any of these, so I will try to combine them all to save you the hassle of searching for them.</p> <p>In my search I have uncovered the following layers to this:</p> <h3>First, try logging</h3> <p>For many problems, activating logging can help you uncover what goes wrong (source):</p> <pre class="prettyprint lang-py prettyprint-override"><code>import requests import logging import http.client http.client.HTTPConnection.debuglevel = 1 # You must initialize logging, otherwise you'll not see debug output. logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) requests_log = logging.getLogger("requests.packages.urllib3") requests_log.setLevel(logging.DEBUG) requests_log.propagate = True requests.get("https://www.example.com") </code></pre> <p>In case the debug output does not help you solve the problem, read on.</p> <h3>If you only need to check if the server is up, try a HEAD or streaming request</h3> <p>It can be faster to not request all data, but to only send a HEAD request (source):</p> <pre class="prettyprint lang-py prettyprint-override"><code>requests.head("https://www.example.com") </code></pre> <p>Some servers don't support this, then you can try to stream the response (source):</p> <pre class="prettyprint lang-py prettyprint-override"><code>requests.get("https://www.example.com", stream=True) </code></pre> <h3>For multiple requests in a row, try utilizing a Session</h3> <p>If you send multiple requests in a row, you can speed up the requests by utilizing a <code>requests.Session</code>. This makes sure the connection to the server stays open and configured and also persists cookies as a nice benefit. Try this (source):</p> <pre class="prettyprint lang-py prettyprint-override"><code>import requests session = requests.Session() for _ in range(10): session.get("https://www.example.com") </code></pre> <h3>To parallelize your requests (try for > 10 requests), use requests-futures</h3> <p>If you send a very large number of requests at once, each request blocks execution. You can parallelize this utilizing, e.g., requests-futures (idea from kederrac):</p> <pre class="prettyprint lang-py prettyprint-override"><code>from concurrent.futures import as_completed from requests_futures.sessions import FuturesSession with FuturesSession() as session: futures = [session.get("https://www.example.com") for _ in range(10)] for future in as_completed(futures): response = future.result() </code></pre> <p>Be careful not to overwhelm the server with too many requests at the same time.</p> <p>If this also does not solve your problem, read on...</p> <h3>The reason might not lie with requests, but the server or your connection</h3> <p>In many cases, the reason might lie with the server you are requesting from. First, verify this by requesting any other URL in the same fashion:</p> <pre class="prettyprint lang-py prettyprint-override"><code>requests.get("https://www.google.com") </code></pre> <p>If this works fine, you can focus your efforts on the following possible problems:</p> <h3>The server only allows specific user-agent strings</h3> <p>The server might specifically block <code>requests</code>, or they might utilize a whitelist, or some other reason. To send a nicer user-agent string, try this (source):</p> <pre class="prettyprint lang-py prettyprint-override"><code>headers = {"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"} requests.get("https://www.example.com", headers=headers) </code></pre> <h3>The server rate-limits you</h3> <p>If this problem only occurs sometimes, e.g. after a few requests, the server might be rate-limiting you. Check the response to see if it reads something along those lines (i.e. "rate limit reached", "work queue depth exceeded" or similar; source).</p> <p>Here, the solution is just to wait longer between requests, for example by using <code>time.sleep()</code>.</p> <h3>The server response is incorrectly formatted, leading to parsing problems</h3> <p>You can check this by not reading the response you receive from the server. If the code is still slow, this is not your problem, but if this fixed it, the problem might lie with parsing the response.</p> <ol> <li>In case some headers are set incorrectly, this can lead to parsing errors which prevents chunked transfer (source).</li> <li>In other cases, setting the encoding manually might resolve parsing problems (source).</li> </ol> <p>To fix those, try:</p> <pre class="prettyprint lang-py prettyprint-override"><code>r = requests.get("https://www.example.com") r.raw.chunked = True # Fix issue 1 r.encoding = 'utf-8' # Fix issue 2 print(response.text) </code></pre> <h3>IPv6 does not work, but IPv4 does</h3> <p>This might be the worst problem of all to find. An easy, albeit weird, way to check this, is to add a <code>timeout</code> parameter as follows:</p> <pre class="prettyprint lang-py prettyprint-override"><code>requests.get("https://www.example.com/", timeout=5) </code></pre> <p>If this returns a <em>successful response</em>, the problem should lie with IPv6. The reason is that <code>requests</code> first tries an IPv6 connection. When that times out, it tries to connect via IPv4. By setting the timeout low, you force it to switch to IPv4 within a shorter amount of time.</p> <p>Verify by utilizing, e.g., <code>wget</code> or <code>curl</code>:</p> <pre class="prettyprint lang-sh prettyprint-override"><code>wget --inet6-only https://www.example.com -O - > /dev/null # or curl --ipv6 -v https://www.example.com </code></pre> <p>In both cases, we force the tool to connect via IPv6 to isolate the issue. If this times out, try again forcing IPv4:</p> <pre class="prettyprint lang-sh prettyprint-override"><code>wget --inet4-only https://www.example.com -O - > /dev/null # or curl --ipv4 -v https://www.example.com </code></pre> <p>If this works fine, you have found your problem! But how to solve it, you ask?</p> <ol> <li>A brute-force solution is to disable IPv6 completely.</li> <li>You may also disable IPv6 for the current session only.</li> <li>You may just want to force requests to use IPv4. (In the linked answer, you have to adapt the code to always return <code>socket.AF_INET</code> for IPv4.)</li> <li>If you want to fix this problem for SSH, here is how to force IPv4 for SSH. (In short, add <code>AddressFamily inet</code> to your SSH config.)</li> <li>You may also want to check if the problem lies with your DNS or TCP.</li> </ol>

Python requests is slow and takes very long to complete HTTP or HTTPS request

Tags:

When requesting a web resource or website or web service with the requests library, the request takes a long time to complete. The code looks similar to the following:

import requests requests.get("https://www.example.com/")

This request takes over 2 minutes (exactly 2 minutes 10 seconds) to complete! Why is it so slow and how can I fix it?

848

asked Jun 26 '20 16:06

vauhochzett

1 Answers

There can be multiple possible solutions to this problem. There are a multitude of answers on StackOverflow for any of these, so I will try to combine them all to save you the hassle of searching for them.

In my search I have uncovered the following layers to this:

First, try logging

For many problems, activating logging can help you uncover what goes wrong (source):

import requests import logging  import http.client http.client.HTTPConnection.debuglevel = 1  # You must initialize logging, otherwise you'll not see debug output. logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) requests_log = logging.getLogger("requests.packages.urllib3") requests_log.setLevel(logging.DEBUG) requests_log.propagate = True  requests.get("https://www.example.com")

In case the debug output does not help you solve the problem, read on.

If you only need to check if the server is up, try a HEAD or streaming request

It can be faster to not request all data, but to only send a HEAD request (source):

requests.head("https://www.example.com")

Some servers don't support this, then you can try to stream the response (source):

requests.get("https://www.example.com", stream=True)

For multiple requests in a row, try utilizing a Session

If you send multiple requests in a row, you can speed up the requests by utilizing a requests.Session. This makes sure the connection to the server stays open and configured and also persists cookies as a nice benefit. Try this (source):

import requests session = requests.Session() for _ in range(10):     session.get("https://www.example.com")

To parallelize your requests (try for > 10 requests), use requests-futures

If you send a very large number of requests at once, each request blocks execution. You can parallelize this utilizing, e.g., requests-futures (idea from kederrac):

from concurrent.futures import as_completed from requests_futures.sessions import FuturesSession  with FuturesSession() as session:     futures = [session.get("https://www.example.com") for _ in range(10)]     for future in as_completed(futures):         response = future.result()

Be careful not to overwhelm the server with too many requests at the same time.

If this also does not solve your problem, read on...

The reason might not lie with requests, but the server or your connection

In many cases, the reason might lie with the server you are requesting from. First, verify this by requesting any other URL in the same fashion:

requests.get("https://www.google.com")

If this works fine, you can focus your efforts on the following possible problems:

The server only allows specific user-agent strings

The server might specifically block requests, or they might utilize a whitelist, or some other reason. To send a nicer user-agent string, try this (source):

headers = {"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"} requests.get("https://www.example.com", headers=headers)

The server rate-limits you

If this problem only occurs sometimes, e.g. after a few requests, the server might be rate-limiting you. Check the response to see if it reads something along those lines (i.e. "rate limit reached", "work queue depth exceeded" or similar; source).

Here, the solution is just to wait longer between requests, for example by using time.sleep().

The server response is incorrectly formatted, leading to parsing problems

You can check this by not reading the response you receive from the server. If the code is still slow, this is not your problem, but if this fixed it, the problem might lie with parsing the response.

In case some headers are set incorrectly, this can lead to parsing errors which prevents chunked transfer (source).
In other cases, setting the encoding manually might resolve parsing problems (source).

To fix those, try:

r = requests.get("https://www.example.com") r.raw.chunked = True # Fix issue 1 r.encoding = 'utf-8' # Fix issue 2 print(response.text)

IPv6 does not work, but IPv4 does

This might be the worst problem of all to find. An easy, albeit weird, way to check this, is to add a timeout parameter as follows:

requests.get("https://www.example.com/", timeout=5)

If this returns a successful response, the problem should lie with IPv6. The reason is that requests first tries an IPv6 connection. When that times out, it tries to connect via IPv4. By setting the timeout low, you force it to switch to IPv4 within a shorter amount of time.

Verify by utilizing, e.g., wget or curl:

wget --inet6-only https://www.example.com -O - > /dev/null # or curl --ipv6 -v https://www.example.com

In both cases, we force the tool to connect via IPv6 to isolate the issue. If this times out, try again forcing IPv4:

wget --inet4-only https://www.example.com -O - > /dev/null # or curl --ipv4 -v https://www.example.com

If this works fine, you have found your problem! But how to solve it, you ask?

A brute-force solution is to disable IPv6 completely.
You may also disable IPv6 for the current session only.
You may just want to force requests to use IPv4. (In the linked answer, you have to adapt the code to always return socket.AF_INET for IPv4.)
If you want to fix this problem for SSH, here is how to force IPv4 for SSH. (In short, add AddressFamily inet to your SSH config.)
You may also want to check if the problem lies with your DNS or TCP.

160

answered Oct 05 '22 17:10

vauhochzett

Related questions
                            
                                Why can't I connect to a WCF service with net.tcp but i can with http?
                            
                                Oracle - Best SELECT statement for getting the difference in minutes between two DateTime columns?
                            
                                How can I remove the NULL character from string
                            
                                What information is OK to store in cookies?
                            
                                Contrasting C# generics with Haskell parameterized types
                            
                                Why all java methods are implicitly overridable?
                            
                                How to redirect all URLs with Google App Engine
                            
                                Drawbacks of storing an integer as a string in a database
                            
                                Best way to start using jQuery in a Zend Framework 1.9 application?
                            
                                Objective-C creating a text file with a string
                            
                                Favorite minimalistic .vimrc configuration [closed]
                            
                                How to implement the greater than or equal SQL statement in iBatis?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python requests is slow and takes very long to complete HTTP or HTTPS request

Tags:

vauhochzett

People also ask

1 Answers

First, try logging

If you only need to check if the server is up, try a HEAD or streaming request

For multiple requests in a row, try utilizing a Session

To parallelize your requests (try for > 10 requests), use requests-futures

The reason might not lie with requests, but the server or your connection

The server only allows specific user-agent strings

The server rate-limits you

The server response is incorrectly formatted, leading to parsing problems

IPv6 does not work, but IPv4 does

vauhochzett

Recent Activity

Donate For Us