When requesting a web resource or website or web service with the requests library, the request takes a long time to complete. The code looks similar to the following:
import requests requests.get("https://www.example.com/")
This request takes over 2 minutes (exactly 2 minutes 10 seconds) to complete! Why is it so slow and how can I fix it?
Python requests is slow and takes very long to complete HTTP or HTTPS request - Stack Overflow. Stack Overflow for Teams – Start collaborating and sharing organizational knowledge.
Requests verifies SSL certificates for HTTPS requests, just like a web browser. SSL Certificates are small data files that digitally bind a cryptographic key to an organization's details.
It will wait until the response arrives before the rest of your program will execute. If you want to be able to do other things, you will probably want to look at the asyncio or multiprocessing modules. Chad S. Chad S.
There can be multiple possible solutions to this problem. There are a multitude of answers on StackOverflow for any of these, so I will try to combine them all to save you the hassle of searching for them.
In my search I have uncovered the following layers to this:
For many problems, activating logging can help you uncover what goes wrong (source):
import requests import logging import http.client http.client.HTTPConnection.debuglevel = 1 # You must initialize logging, otherwise you'll not see debug output. logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) requests_log = logging.getLogger("requests.packages.urllib3") requests_log.setLevel(logging.DEBUG) requests_log.propagate = True requests.get("https://www.example.com")
In case the debug output does not help you solve the problem, read on.
It can be faster to not request all data, but to only send a HEAD request (source):
requests.head("https://www.example.com")
Some servers don't support this, then you can try to stream the response (source):
requests.get("https://www.example.com", stream=True)
If you send multiple requests in a row, you can speed up the requests by utilizing a requests.Session
. This makes sure the connection to the server stays open and configured and also persists cookies as a nice benefit. Try this (source):
import requests session = requests.Session() for _ in range(10): session.get("https://www.example.com")
If you send a very large number of requests at once, each request blocks execution. You can parallelize this utilizing, e.g., requests-futures (idea from kederrac):
from concurrent.futures import as_completed from requests_futures.sessions import FuturesSession with FuturesSession() as session: futures = [session.get("https://www.example.com") for _ in range(10)] for future in as_completed(futures): response = future.result()
Be careful not to overwhelm the server with too many requests at the same time.
If this also does not solve your problem, read on...
In many cases, the reason might lie with the server you are requesting from. First, verify this by requesting any other URL in the same fashion:
requests.get("https://www.google.com")
If this works fine, you can focus your efforts on the following possible problems:
The server might specifically block requests
, or they might utilize a whitelist, or some other reason. To send a nicer user-agent string, try this (source):
headers = {"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"} requests.get("https://www.example.com", headers=headers)
If this problem only occurs sometimes, e.g. after a few requests, the server might be rate-limiting you. Check the response to see if it reads something along those lines (i.e. "rate limit reached", "work queue depth exceeded" or similar; source).
Here, the solution is just to wait longer between requests, for example by using time.sleep()
.
You can check this by not reading the response you receive from the server. If the code is still slow, this is not your problem, but if this fixed it, the problem might lie with parsing the response.
To fix those, try:
r = requests.get("https://www.example.com") r.raw.chunked = True # Fix issue 1 r.encoding = 'utf-8' # Fix issue 2 print(response.text)
This might be the worst problem of all to find. An easy, albeit weird, way to check this, is to add a timeout
parameter as follows:
requests.get("https://www.example.com/", timeout=5)
If this returns a successful response, the problem should lie with IPv6. The reason is that requests
first tries an IPv6 connection. When that times out, it tries to connect via IPv4. By setting the timeout low, you force it to switch to IPv4 within a shorter amount of time.
Verify by utilizing, e.g., wget
or curl
:
wget --inet6-only https://www.example.com -O - > /dev/null # or curl --ipv6 -v https://www.example.com
In both cases, we force the tool to connect via IPv6 to isolate the issue. If this times out, try again forcing IPv4:
wget --inet4-only https://www.example.com -O - > /dev/null # or curl --ipv4 -v https://www.example.com
If this works fine, you have found your problem! But how to solve it, you ask?
socket.AF_INET
for IPv4.)AddressFamily inet
to your SSH config.)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With