I am working with python's requests module for HTTP communication, and I am wondering how to reuse already-established TCP connections? The requests module is stateless and if I repeatedly call get for the same URL, wouldn't it create a new connection each time?
Thanks!!
Any requests that you make within a session will automatically reuse the appropriate connection!
The requests module allows you to send HTTP requests using Python. The HTTP request returns a Response Object with all the response data (content, encoding, status, etc).
Session() is not thread-safe. This means that there are places where the type of interaction described above could happen if multiple threads use the same Session .
Global functions like requests.get
or requests.post
create the requests.Session
instance on each call. Connections made with these functions cannot be reused, because you cannot access automatically created session and use it's connection pool for subsequent requests. It's fine to use these functions if you have to do just a few requests. Otherwise you'll want to manage sessions yourself.
Here is a quick display of requests
behavior when you use global get
function and session.
Preparation, not really relevant to the question:
>>> import logging, requests, timeit >>> logging.basicConfig(level=logging.DEBUG, format="%(message)s")
See, a new connection is established each time you call get
:
>>> _ = requests.get("https://www.wikipedia.org") Starting new HTTPS connection (1): www.wikipedia.org >>> _ = requests.get("https://www.wikipedia.org") Starting new HTTPS connection (1): www.wikipedia.org
But if you use the same session for subsequent calls, the connection gets reused:
>>> session = requests.Session() >>> _ = session.get("https://www.wikipedia.org") Starting new HTTPS connection (1): www.wikipedia.org >>> _ = session.get("https://www.wikipedia.org") >>> _ = session.get("https://www.wikipedia.org") >>> _ = session.get("https://www.wikipedia.org")
Performance:
>>> timeit.timeit('_ = requests.get("https://www.wikipedia.org")', 'import requests', number=100) Starting new HTTPS connection (1): www.wikipedia.org Starting new HTTPS connection (1): www.wikipedia.org Starting new HTTPS connection (1): www.wikipedia.org ... Starting new HTTPS connection (1): www.wikipedia.org Starting new HTTPS connection (1): www.wikipedia.org Starting new HTTPS connection (1): www.wikipedia.org 52.74904417991638 >>> timeit.timeit('_ = session.get("https://www.wikipedia.org")', 'import requests; session = requests.Session()', number=100) Starting new HTTPS connection (1): www.wikipedia.org 15.770191192626953
Works much faster when you reuse the session (and thus session's connection pool).
The requests module is stateless and if I repeatedly call get for the same URL, wouldnt it create a new connection each time?
The requests
module is not stateless; it just lets you ignore the state and effectively use a global singleton state if you choose to do so.*
And it (or, rather, one of the underlying libraries, urllib3
) maintains a connection pool keyed by (hostname, port) pair, so it will usually just magically reuse a connection if it can.
As the documentation says:
Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!
Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set
stream
toFalse
or read thecontent
property of theResponse
object.
So, what does "if it can" mean? As the docs above imply, if you're keeping streaming response objects alive, their connections obviously can't be reused.
Also, the connection pool is really a finite cache, not infinite, so if you spam out a ton of connections and two of them are to the same server, you won't always reuse the connection, just often. But usually, that's what you actually want.
* The particular state relevant here is the transport adapter. Each session gets a transport adapter. You can specify the adapter manually, or you can specify a global default, or you can just use the default global default, which basically just wraps up a urllib3.PoolManager
for managing its HTTP connections. For more information, read the docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With