Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python urllib3 and how to handle cookie support?

Tags:

python

urllib3

So I'm looking into urllib3 because it has connection pooling and is thread safe (so performance is better, especially for crawling), but the documentation is... minimal to say the least. urllib2 has build_opener so something like:

#!/usr/bin/python
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")

But urllib3 has no build_opener method, so the only way I have figured out so far is to manually put it in the header:

#!/usr/bin/python
import urllib3
http_pool = urllib3.connection_from_url("http://example.com")
myheaders = {'Cookie':'some cookie data'}
r = http_pool.get_url("http://example.org/", headers=myheaders)

But I am hoping there is a better way and that one of you can tell me what it is. Also can someone tag this with "urllib3" please.

like image 675
bigredbob Avatar asked Mar 11 '10 06:03

bigredbob


People also ask

How do you store cookies in Python?

Create cookie In Flask, set the cookie on the response object. Use the make_response() function to get the response object from the return value of the view function. After that, the cookie is stored using the set_cookie() function of the response object. It is easy to read back cookies.

What is urllib3 PoolManager ()?

The PoolManager class automatically handles creating ConnectionPool instances for each host as needed. By default, it will keep a maximum of 10 ConnectionPool instances. If you're making requests to many different hosts it might improve performance to increase this number: >>> import urllib3 >>> http = urllib3.

What is a cookie jar Python?

cookiejar module defines classes for automatic handling of HTTP cookies. It is useful for accessing web sites that require small pieces of data – cookies – to be set on the client machine by an HTTP response from a web server, and then returned to the server in later HTTP requests.


1 Answers

You're correct, there's no immediately better way to do this right now. I would be more than happy to accept a patch if you have a congruent improvement.

One thing to keep in mind, urllib3's HTTPConnectionPool is intended to be a "pool of connections" to a specific host, as opposed to a stateful client. In that context, it makes sense to keep the tracking of cookies outside of the actual pool.

  • shazow (the author of urllib3)
like image 127
shazow Avatar answered Sep 28 '22 07:09

shazow