So, I've been scouring the web trying to learn more about how to log into websites programmatically using C#. I don't want to use a web client. I think I want to use something like HttpWebRequest and HttpWebResponse, but I have no idea how these classes work.
I guess I'm looking for someone to explain how they work and the steps required to successfully log in to, say, WordPress, an email account, or any site that requires that you fill in a form with a username and password.
Here's one of my attempts:
// Declare variables
string url = textBoxGetSource.Text;
string username = textBoxUsername.Text;
string password = PasswordBoxPassword.Password;
// Values for site login fields - username and password html ID's
string loginUsernameID = textBoxUsernameID.Text;
string loginPasswordID = textBoxPasswordID.Text;
string loginSubmitID = textBoxSubmitID.Text;
// Connection parameters
string method = "POST";
string contentType = @"application/x-www-form-urlencoded";
string loginString = loginUsernameID + "=" + username + "&" + loginPasswordID + "=" + password + "&" + loginSubmitID;
CookieContainer cookieJar = new CookieContainer();
HttpWebRequest request;
request = (HttpWebRequest)WebRequest.Create(url);
request.CookieContainer = cookieJar;
request.Method = method;
request.ContentType = contentType;
request.KeepAlive = true;
using (Stream requestStream = request.GetRequestStream())
using (StreamWriter writer = new StreamWriter(requestStream))
{
writer.Write(loginString, username, password);
}
using (var responseStream = request.GetResponse().GetResponseStream())
using (var reader = new StreamReader(responseStream))
{
var result = reader.ReadToEnd();
Console.WriteLine(result);
richTextBoxSource.AppendText(result);
}
MessageBox.Show("Successfully logged in.");
I don't know if I'm on the right track or not. I end up being returned back to the login screen of whatever site I try. I've downloaded Fiddler and was able to glean a little bit of information about what information is sent to the server, but I feel completely lost. If anyone could shed some light here, I would greatly appreciate it.
Logging into websites programatically is difficult and tightly coupled with how the site implements its login procedure. The reason your code isn't working is because you aren't dealing with any of this in your requests/responses.
Let's take fif.com for example. When you type in a username and password, the following post request gets sent:
POST https://fif.com/login?task=user.login HTTP/1.1
Host: fif.com
Connection: keep-alive
Content-Length: 114
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: https://fif.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.103 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: https://fif.com/login?return=...==
Accept-Encoding: gzip,deflate
Accept-Language: en-US,en;q=0.8
Cookie: 34f8f7f621b2b411508c0fd39b2adbb2=gnsbq7hcm3c02aa4sb11h5c87f171mh3; __utma=175527093.69718440.1410315941.1410315941.1410315941.1; __utmb=175527093.12.10.1410315941; __utmc=175527093; __utmz=175527093.1410315941.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmv=175527093.|1=RegisteredUsers=Yes=1
username=...&password=...&return=aHR0cHM6Ly9maWYuY29tLw%3D%3D&9a9bd5b68a7a9e5c3b06ccd9b946ebf9=1
Notice the cookies (especially the first, your session token). Notice the cryptic url-encoded return value being sent. If the server notices these are missing, it won't let you login.
HTTP/1.1 400 Bad Request
Or worse, a 200 response of a login page with an error message buried somewhere inside.
But let's just pretend you were able to collect all of those magic values and pass them in an HttpWebRequest object. The site wouldn't know the difference. And it might respond with something like this.
HTTP/1.1 303 See other
Server: nginx
Date: Wed, 10 Sep 2014 02:29:09 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Location: https://fif.com/
Hope you were expecting that. But if you've made it this far, you can now programatically fire off requests to the server with your now validated session token and get the expected HTML back.
GET https://fif.com/ HTTP/1.1
Host: fif.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.103 Safari/537.36
Referer: https://fif.com/login?return=aHR0cHM6Ly9maWYuY29tLw==
Accept-Encoding: gzip,deflate
Accept-Language: en-US,en;q=0.8
Cookie: 34f8f7f621b2b411508c0fd39b2adbb2=gnsbq7hcm3c02aa4sb11h5c87f171mh3; __utma=175527093.69718440.1410315941.1410315941.1410315941.1; __utmb=175527093.12.10.1410315941; __utmc=175527093; __utmz=175527093.1410315941.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmv=175527093.|1=RegisteredUsers=Yes=1
And this is all for fif.com - this juggling of cookies and tokens and redirects will be completely different for another site. In my experience (with that site in particular), you have three options to get through the login wall.
Selenium can handle all the juggling, and at the end you can pull the cookies out and fire off your requests normally. Here's an example for fif:
//Run selenium
ChromeDriver cd = new ChromeDriver(@"chromedriver_win32");
cd.Url = @"https://fif.com/login";
cd.Navigate();
IWebElement e = cd.FindElementById("username");
e.SendKeys("...");
e = cd.FindElementById("password");
e.SendKeys("...");
e = cd.FindElementByXPath(@"//*[@id=""main""]/div/div/div[2]/table/tbody/tr/td[1]/div/form/fieldset/table/tbody/tr[6]/td/button");
e.Click();
CookieContainer cc = new CookieContainer();
//Get the cookies
foreach(OpenQA.Selenium.Cookie c in cd.Manage().Cookies.AllCookies)
{
string name = c.Name;
string value = c.Value;
cc.Add(new System.Net.Cookie(name,value,c.Path,c.Domain));
}
//Fire off the request
HttpWebRequest hwr = (HttpWebRequest) HttpWebRequest.Create("https://fif.com/components/com_fif/tools/capacity/values/");
hwr.CookieContainer = cc;
hwr.Method = "POST";
hwr.ContentType = "application/x-www-form-urlencoded";
StreamWriter swr = new StreamWriter(hwr.GetRequestStream());
swr.Write("feeds=35");
swr.Close();
WebResponse wr = hwr.GetResponse();
string s = new System.IO.StreamReader(wr.GetResponseStream()).ReadToEnd();
Checkout this post. It's another way of doing it and you don't need to install any package although it might be easier with Selenium.
"You can continue using WebClient to POST (instead of GET, which is the HTTP verb you're currently using with DownloadString), but I think you'll find it easier to work with the (slightly) lower-level classes WebRequest and WebResponse.
There are two parts to this - the first is to post the login form, the second is recovering the "Set-cookie" header and sending that back to the server as "Cookie" along with your GET request. The server will use this cookie to identify you from now on (assuming it's using cookie-based authentication which I'm fairly confident it is as that page returns a Set-cookie header which includes "PHPSESSID").
POSTing to the login form
Form posts are easy to simulate, it's just a case of formatting your post data as follows:
field1=value1&field2=value2
Using WebRequest and code I adapted from Scott Hanselman, here's how you'd POST form data to your login form:
string formUrl = "http://www.mmoinn.com/index.do?PageModule=UsersAction&Action=UsersLogin";
NOTE: This is the URL the form POSTs to, not the URL of the form (you can find this in the "action" attribute of the HTML's form tag
string formParams = string.Format("email_address={0}&password={1}", "your email", "your password"); string cookieHeader; WebRequest req = WebRequest.Create(formUrl); req.ContentType = "application/x-www-form-urlencoded"; req.Method = "POST"; byte[] bytes = Encoding.ASCII.GetBytes(formParams); req.ContentLength = bytes.Length; using (Stream os = req.GetRequestStream()) { os.Write(bytes, 0, bytes.Length); } WebResponse resp = req.GetResponse(); cookieHeader = resp.Headers["Set-cookie"];
Here's an example of what you should see in the Set-cookie header for your login form:
PHPSESSID=c4812cffcf2c45e0357a5a93c137642e; path=/; domain=.mmoinn.com,wowmine_referer=directenter; path=/;
domain=.mmoinn.com,lang=en; path=/;domain=.mmoinn.com,adt_usertype=other,adt_host=-
GETting the page behind the login form
Now you can perform your GET request to a page that you need to be logged in for.
string pageSource; string getUrl = "the url of the page behind the login"; WebRequest getRequest = WebRequest.Create(getUrl); getRequest.Headers.Add("Cookie", cookieHeader); WebResponse getResponse = getRequest.GetResponse(); using (StreamReader sr = new StreamReader(getResponse.GetResponseStream())) { pageSource = sr.ReadToEnd(); }
EDIT:
If you need to view the results of the first POST, you can recover the HTML it returned with:
using (StreamReader sr = new StreamReader(resp.GetResponseStream())) { pageSource = sr.ReadToEnd(); }
Place this directly below
cookieHeader = resp.Headers["Set-cookie"];
and then inspect the string held in pageSource."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With