Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python requests module and connection reuse

I am working with python's requests module for HTTP communication, and I am wondering how to reuse already-established TCP connections? The requests module is stateless and if I repeatedly call get for the same URL, wouldn't it create a new connection each time?

Thanks!!

like image 402
gmemon Avatar asked Jul 21 '14 20:07

gmemon


People also ask

Does Python requests reuse connection?

Any requests that you make within a session will automatically reuse the appropriate connection!

What can you do with requests module?

The requests module allows you to send HTTP requests using Python. The HTTP request returns a Response Object with all the response data (content, encoding, status, etc).

Is Session thread safe python?

Session() is not thread-safe. This means that there are places where the type of interaction described above could happen if multiple threads use the same Session .


2 Answers

Global functions like requests.get or requests.post create the requests.Session instance on each call. Connections made with these functions cannot be reused, because you cannot access automatically created session and use it's connection pool for subsequent requests. It's fine to use these functions if you have to do just a few requests. Otherwise you'll want to manage sessions yourself.

Here is a quick display of requests behavior when you use global get function and session.

Preparation, not really relevant to the question:

>>> import logging, requests, timeit >>> logging.basicConfig(level=logging.DEBUG, format="%(message)s") 

See, a new connection is established each time you call get:

>>> _ = requests.get("https://www.wikipedia.org") Starting new HTTPS connection (1): www.wikipedia.org >>> _ = requests.get("https://www.wikipedia.org") Starting new HTTPS connection (1): www.wikipedia.org 

But if you use the same session for subsequent calls, the connection gets reused:

>>> session = requests.Session() >>> _ = session.get("https://www.wikipedia.org") Starting new HTTPS connection (1): www.wikipedia.org >>> _ = session.get("https://www.wikipedia.org") >>> _ = session.get("https://www.wikipedia.org") >>> _ = session.get("https://www.wikipedia.org") 

Performance:

>>> timeit.timeit('_ = requests.get("https://www.wikipedia.org")', 'import requests', number=100) Starting new HTTPS connection (1): www.wikipedia.org Starting new HTTPS connection (1): www.wikipedia.org Starting new HTTPS connection (1): www.wikipedia.org ... Starting new HTTPS connection (1): www.wikipedia.org Starting new HTTPS connection (1): www.wikipedia.org Starting new HTTPS connection (1): www.wikipedia.org 52.74904417991638 >>> timeit.timeit('_ = session.get("https://www.wikipedia.org")', 'import requests; session = requests.Session()', number=100) Starting new HTTPS connection (1): www.wikipedia.org 15.770191192626953 

Works much faster when you reuse the session (and thus session's connection pool).

like image 84
Діма Киричук Avatar answered Sep 20 '22 00:09

Діма Киричук


The requests module is stateless and if I repeatedly call get for the same URL, wouldnt it create a new connection each time?

The requests module is not stateless; it just lets you ignore the state and effectively use a global singleton state if you choose to do so.*

And it (or, rather, one of the underlying libraries, urllib3) maintains a connection pool keyed by (hostname, port) pair, so it will usually just magically reuse a connection if it can.

As the documentation says:

Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!

Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set stream to False or read the content property of the Response object.

So, what does "if it can" mean? As the docs above imply, if you're keeping streaming response objects alive, their connections obviously can't be reused.

Also, the connection pool is really a finite cache, not infinite, so if you spam out a ton of connections and two of them are to the same server, you won't always reuse the connection, just often. But usually, that's what you actually want.


* The particular state relevant here is the transport adapter. Each session gets a transport adapter. You can specify the adapter manually, or you can specify a global default, or you can just use the default global default, which basically just wraps up a urllib3.PoolManager for managing its HTTP connections. For more information, read the docs.

like image 36
abarnert Avatar answered Sep 22 '22 00:09

abarnert