I need some information from a website that's not mine, in order to get this information I need to login to the website to gather the information, this happens through a HTML form. How can I do this authenticated screenscaping in C#?
Extra information:
You'd make the request as though you'd just filled out the form. Assuming it's POST for example, you make a POST request with the correct data. Now if you can't login directly to the same page you want to scrape, you will have to track whatever cookies are set after your login request, and include them in your scraping request to allow you to stay logged in.
It might look like:
HttpWebRequest http = WebRequest.Create(url) as HttpWebRequest;
http.KeepAlive = true;
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
string postData="FormNameForUserId=" + strUserId + "&FormNameForPassword=" + strPassword;
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
// Probably want to inspect the http.Headers here first
http = WebRequest.Create(url2) as HttpWebRequest;
http.CookieContainer = new CookieContainer();
http.CookieContainer.Add(httpResponse.Cookies);
HttpWebResponse httpResponse2 = http.GetResponse() as HttpWebResponse;
Maybe.
You can use a WebBrowser control. Just feed it the URL of the site, then use the DOM to set the username and password into the right fields, and eventually send a click to the submit button. This way you don't care about anything but the two input fields and the submit button. No cookie handling, no raw HTML parsing, no HTTP sniffing - all that is done by the browser control.
If you go that way, a few more suggestions:
For some cases, httpResponse.Cookies
will be blank. Use the CookieContainer
instead.
CookieContainer cc = new CookieContainer();
HttpWebRequest http = WebRequest.Create(url) as HttpWebRequest;
http.KeepAlive = true;
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
http.CookieContainer = cc;
string postData="FormNameForUserId=" + strUserId + "&FormNameForPassword=" + strPassword;
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
// Probably want to inspect the http.Headers here first
http = WebRequest.Create(url2) as HttpWebRequest;
http.CookieContainer = cc;
HttpWebResponse httpResponse2 = http.GetResponse() as HttpWebResponse;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With