I would like to develop automatic scraper for asp password protected web page. I have a login/password for this page.
First of all, a look in to Firebug log during authorization via firefox. What I have found:
http://mysite
http://mysite/Account/Login
with parameters UserName, Password and __RequestVerificationToken, also it uses cookie saved on step 1http://mysite/Account/Index
(page which I want to scrape)My code
//1. Get __RequestVerificationToken cookie
$urlLogin = "http://mysite";
$cookieFile = "cookie.txt";
$regs=array();
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $urlLogin);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
curl_setopt($ch, CURLOPT_STDERR,$f = fopen("answer.txt", "w+"));
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:18.0) Gecko/20100101 Firefox/18.0' );
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
$data=curl_exec($ch);
//2. Parse token value for the post request
$hash=file_get_contents("answer.txt");
preg_match_all('/=(.*); p/i',$hash, $regs);
//3. Make a post request
$postData = '__RequestVerificationToken='.$regs[1][0].'&UserName=someLogin'.'&Password=somePassword';
$urlSecuredPage = "http://mysite/Account/Login";
curl_setopt($ch, CURLOPT_URL, $urlSecuredPage);
curl_setopt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile);
$data = curl_exec($ch);
curl_close($ch);
At step 3 my cookie saved on step 1 is rewriting with new value of __RequestVerificationToken. I don`t understand why it happens. As a result I can not authorize due to wrong __RequestVerificationToken and get HTTP 500 error.
Where I`m wrong?
There are should be two things for __RequestVerificationToken. One of them in hidden input value, the second one in the cookie. Value from hidden input value is sent in each request. And for each request it has a new value. It depends on cookie value.
So you need to save input value and cookie, and send them back together. If you won't send value from hidden input, then Asp.Net MVC thinks that this is an attack, and generate new cookie. New cookie will be generated only if validation failed or the cookie itself doesn't exists. If you get that cookie, and you always send __RequestVerificationToken input value with POST request, then it shouldn't generate new cookie.
If it's still generated, then you are sending incorrect __RequestVerificationToken from hidden input value. Try to do the same from Fiddler\Charles, and check will be return success result or not.
They are used to prevent CSRF attacks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With