Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - urllib2 & cookielib

Tags:

I am trying to open the following website and retrieve the initial cookie and use it for the second url-open BUT if you run the following code it outputs 2 different cookies. How do I use the initial cookie for the second url-open?

import cookielib, urllib2  cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))  home = opener.open('https://www.idcourts.us/repository/start.do') print cj  search = opener.open('https://www.idcourts.us/repository/partySearch.do') print cj 

Output shows 2 different cookies every time as you can see:

<cookielib.CookieJar[<Cookie JSESSIONID=0DEEE8331DE7D0DFDC22E860E065085F for www.idcourts.us/repository>]> <cookielib.CookieJar[<Cookie JSESSIONID=E01C2BE8323632A32DA467F8A9B22A51 for www.idcourts.us/repository>]> 
like image 527
Adrian Avatar asked Jan 03 '11 08:01

Adrian


People also ask

What is urllib2 in Python?

urllib2 is a Python module that can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest. authentication, redirections, cookies, etc) The magic starts with importing the urllib2 module.

How do I fix No module named urllib2?

The Python "ModuleNotFoundError: No module named 'urllib2'" occurs because the urllib2 module has been split into urllib. request and urllib. response in Python 3. To solve the error, import the module as from urllib.

Does urllib2 work in Python 3?

NOTE: urllib2 is no longer available in Python 3.

Is urllib2 deprecated?

urllib2 is deprecated in python 3. x. use urllib instaed.


1 Answers

This is not a problem with urllib. That site does some funky stuff. You need to request a couple of stylesheets for it to validate your session id:

import cookielib, urllib2  cj = cookielib.CookieJar() opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) # default User-Agent ('Python-urllib/2.6') will *not* work opener.addheaders = [     ('User-Agent', 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11'),     ]   stylesheets = [     'https://www.idcourts.us/repository/css/id_style.css',     'https://www.idcourts.us/repository/css/id_print.css', ]  home = opener.open('https://www.idcourts.us/repository/start.do') print cj sessid = cj._cookies['www.idcourts.us']['/repository']['JSESSIONID'].value # Note the += opener.addheaders += [     ('Referer', 'https://www.idcourts.us/repository/start.do'),     ] for st in stylesheets:     # da trick     opener.open(st+';jsessionid='+sessid) search = opener.open('https://www.idcourts.us/repository/partySearch.do') print cj # perhaps need to keep updating the referer... 
like image 101
albertov Avatar answered Sep 21 '22 09:09

albertov