Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

post request using python to asp.net page

i want scrap the PINCODEs from "http://www.indiapost.gov.in/pin/", i am doing with following code written.

import urllib
import urllib2
headers = {
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Origin': 'http://www.indiapost.gov.in',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Referer': 'http://www.indiapost.gov.in/pin/',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}
viewstate = 'JulXDv576ZUXoVOwThQQj4bDuseXWDCZMP0tt+HYkdHOVPbx++G8yMISvTybsnQlNN76EX/...'
eventvalidation = '8xJw9GG8LMh6A/b6/jOWr970cQCHEj95/6ezvXAqkQ/C1At06MdFIy7+iyzh7813e1/3Elx...'
url = 'http://www.indiapost.gov.in/pin/'
formData = (
    ('__EVENTVALIDATION', eventvalidation),
    ('__EVENTTARGET',''),
    ('__EVENTARGUMENT',''),
    ('__VIEWSTATE', viewstate),
    ('__VIEWSTATEENCRYPTED',''),
    ('__EVENTVALIDATION', eventvalidation),
    ('txt_offname',''),
    ('ddl_dist','0'),
    ('txt_dist_on',''),
    ('ddl_state','2'),
    ('btn_state','Search'),
    ('txt_stateon',''),
    ('hdn_tabchoice','3')
)


from urllib import FancyURLopener
class MyOpener(FancyURLopener):
    version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'

myopener = MyOpener()


encodedFields = urllib.urlencode(formData)

f = myopener.open(url, encodedFields)
print f.info()

try:
fout = open('tmp.txt', 'w')
except:
print('Could not open output file\n')

fout.writelines(f.readlines())
fout.close()

i am getting response from server as "Sorry this site has encountered a serious problem, please try reloading the page or contact webmaster." pl suggest where i am going wrong..

like image 926
user2049919 Avatar asked Feb 07 '13 08:02

user2049919


1 Answers

Where did you get the value viewstate and eventvalidation? On one hand, they shouldn't end with "...", you must have omitted something. On the other hand, they shouldn't be hard-coded.

One solution is like this:

  1. Retrieve the page via URL "http://www.indiapost.gov.in/pin/" without any form data
  2. Parse and retrieve the form values like __VIEWSTATE and __EVENTVALIDATION (you may take use of BeautifulSoup).
  3. Get the search result(second HTTP request) by adding vital form-data from step 2.

UPDATE:

According to the above idea, I modify your code slightly to make it work:

import urllib
from bs4 import BeautifulSoup

headers = {
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Origin': 'http://www.indiapost.gov.in',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Referer': 'http://www.indiapost.gov.in/pin/',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}

class MyOpener(urllib.FancyURLopener):
    version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'

myopener = MyOpener()
url = 'http://www.indiapost.gov.in/pin/'
# first HTTP request without form data
f = myopener.open(url)
soup = BeautifulSoup(f)
# parse and retrieve two vital form values
viewstate = soup.select("#__VIEWSTATE")[0]['value']
eventvalidation = soup.select("#__EVENTVALIDATION")[0]['value']

formData = (
    ('__EVENTVALIDATION', eventvalidation),
    ('__VIEWSTATE', viewstate),
    ('__VIEWSTATEENCRYPTED',''),
    ('txt_offname', ''),
    ('ddl_dist', '0'),
    ('txt_dist_on', ''),
    ('ddl_state','1'),
    ('btn_state', 'Search'),
    ('txt_stateon', ''),
    ('hdn_tabchoice', '1'),
    ('search_on', 'Search'),
)

encodedFields = urllib.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)

try:
    # actually we'd better use BeautifulSoup once again to
    # retrieve results(instead of writing out the whole HTML file)
    # Besides, since the result is split into multipages,
    # we need send more HTTP requests
    fout = open('tmp.html', 'w')
except:
    print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
like image 144
Hui Zheng Avatar answered Oct 07 '22 13:10

Hui Zheng