Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using python urllib2 to send POST request and get response

I am trying to get the HTML page back from sending a POST request:

import httplib 
import urllib 
import urllib2 
from BeautifulSoup import BeautifulSoup


headers = {
    'Host': 'digitalvita.pitt.edu',
    'Connection': 'keep-alive',
    'Content-Length': '325', 
    'Origin': 'https://digitalvita.pitt.edu',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1',
    'Content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Accept': 'text/javascript, text/html, application/xml, text/xml, */*',
    'Referer': 'https://digitalvita.pitt.edu/index.php',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
    'Cookie': 'PHPSESSID=lvetilatpgs9okgrntk1nvn595'
}

data = {
    'action': 'search',
    'xdata': '<search id="1"><context type="all" /><results><ordering>familyName</ordering><pagesize>100000</pagesize><page>1</page></results><terms><name>d</name><school>All</school></terms></search>',
    'request': 'search'
}

data = urllib.urlencode(data) 
print data 
req = urllib2.Request('https://digitalvita.pitt.edu/dispatcher.php', data, headers) 
response = urllib2.urlopen(req)
the_page = response.read()

soup=BeautifulSoup(the_page)
print soup

Can anyone tell me how to make it work?

like image 204
user1652287 Avatar asked Sep 20 '12 19:09

user1652287


People also ask

How do I use urllib2 in Python?

urllib2 offers a very simple interface, in the form of the urlopen function. Just pass the URL to urlopen() to get a “file-like” handle to the remote data. like basic authentication, cookies, proxies and so on. These are provided by objects called handlers and openers.

Is urllib2 deprecated?

urllib2 is deprecated in python 3. x. use urllib instaed.

What does Urllib request return?

This function always returns an object which can work as a context manager and has the properties url, headers, and status. See urllib.

Is urllib2 same as urllib3?

Despite the similar name, they are unrelated: they have a different design and a different implementation. urllib was the original Python HTTP client, added to the standard library in Python 1.2.


1 Answers

Do not specify a Content-Length header, urllib2 calculates it for you. As it is, your header specifies the wrong length:

>>> data = urllib.urlencode(data) 
>>> len(data)
319

Without that header the rest of the posted code works fine for me.

like image 67
Martijn Pieters Avatar answered Oct 30 '22 10:10

Martijn Pieters