I've been battling with this for hours now and read several threads on SO and other places, but have hit a dead end. I am trying to scrape a web page and one of the items I need is a text file that is accessed by clicking a button which performs a postback and responds with the text file (or at a box asking me to download it actually).
I've logged into the site by creating a requests session with login info, and I can get to the actual page where I need to download the file, but I can't seem to actually get it to respond with the text file I want and instead keep getting the HTML on the page that has the postback link.
I looked at the parameters to the postback in dev tools and copied them into a dict, and then tried to post to the same URL again with those items as the payload, but I still get the same HTML.
Here is a simplified example (the parameter strings are thousands of lines long). All the below is after successfully logging in to the main page by posting login data.
ajaxpost={
'__VIEWSTATE':'lotsoftext',
'__EVENTVALIDATION':'seriously_too_much_text',
's$maincontent$showitem':'true',
's$maincontent$date':'Monday August 17 2015',
's$maincontent$xyz':''
}
r=requests.Session()
r.post(login_url,data=login_info)
stuff=r.post(stuff_url,ajaxpost)
print(stuff.text) #still html from the above site, not the text file i want
It may be worth noting that the _doPostBack function only has two explicitly visible parameters, as shown below, however when I inspect the post in dev tools i see the additional postings (the three with s$ at the beginning), so i included them as well. Also, the outrageously long text fields (__VIEWSTATE and __EVENTVALIDATION) appear to change each time I post, so I'm not sure if that is an issue or there is an easy way to pull the data directly out of the website without having to view the post parameters in dev tools.
var theForm = document.forms['form1'];
if (!theForm) {
theForm = document.form1;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
This looks very much like an ASP.NET website, and gosh are they irritating to parse.
The viewstate changes with every action you do on the website. Those two parameters are handled by some javascript code.
This means that you have two/three options:
If you're interested, send me a page source and I'll give you the specifics of the code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With