Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: urllib/urllib2/httplib confusion

I'm trying to test the functionality of a web app by scripting a login sequence in Python, but I'm having some troubles.

Here's what I need to do:

  1. Do a POST with a few parameters and headers.
  2. Follow a redirect
  3. Retrieve the HTML body.

Now, I'm relatively new to python, but the two things I've tested so far haven't worked. First I used httplib, with putrequest() (passing the parameters within the URL), and putheader(). This didn't seem to follow the redirects.

Then I tried urllib and urllib2, passing both headers and parameters as dicts. This seems to return the login page, instead of the page I'm trying to login to, I guess it's because of lack of cookies or something.

Am I missing something simple?

Thanks.

like image 748
Ace Avatar asked Nov 19 '08 13:11

Ace


People also ask

What is the difference between Urllib and urllib2?

1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL. 2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn't have such a function. This is one of the reasons why urllib is often used along with urllib2.

Is urllib2 built in Python?

urllib2 is a Python module that can be used for fetching URLs. The magic starts with importing the urllib2 module.

What does Urllib request return?

The urllib. parse. urlencode() function takes a mapping or sequence of 2-tuples and returns an ASCII string in this format. It should be encoded to bytes before being used as the data parameter.

What is the protocol used and the use of Urllib request?

We use the urllib. request module in Python to access and open URLs, which most often use the HTTP protocol. The interface used is also very simple for beginners to use and learn; it uses the urlopen function which can fetch various URLs using a variety of different protocols.


1 Answers

Focus on urllib2 for this, it works quite well. Don't mess with httplib, it's not the top-level API.

What you're noting is that urllib2 doesn't follow the redirect.

You need to fold in an instance of HTTPRedirectHandler that will catch and follow the redirects.

Further, you may want to subclass the default HTTPRedirectHandler to capture information that you'll then check as part of your unit testing.

cookie_handler= urllib2.HTTPCookieProcessor( self.cookies ) redirect_handler= HTTPRedirectHandler() opener = urllib2.build_opener(redirect_handler,cookie_handler) 

You can then use this opener object to POST and GET, handling redirects and cookies properly.

You may want to add your own subclass of HTTPHandler to capture and log various error codes, also.

like image 135
S.Lott Avatar answered Oct 01 '22 10:10

S.Lott