Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

login to website using HTMLAgilityPack

In the below code, I can set the value of the username and password using the HTMLAgilitypack but I cannot invoke the click event of the login button (the id in the source code of the button is "s1").

Is there anyway for this to be done? The reason I'm not using the WebBrowser is because I will need the HTMLAgilityPack to retrieve data from the page without IDs in the source code.

var doc = new HtmlWeb().Load("http://MYURL.com");
doc.DocumentNode.SelectSingleNode("name").SetAttributeValue("value", "MyUsername");
doc.DocumentNode.SelectSingleNode("password").SetAttributeValue("value", "MyPassword");
like image 793
touyets Avatar asked Nov 26 '12 16:11

touyets


2 Answers

Is there anyway for this to be done?

Not with what the HTML Agility Pack (HAP) library provides - not directly.

The HAP is great for getting a single page and parsing it, but it is not designed for continued interactions. Things that are missing are cookie management, JavaScript interaction and more.

In order to login you probably need to send an HTTP POST to the server, including the data you want - the HAP can't help with that.

You will need to use a class like WebRequest to make the post - I suggest looking at fiddler and using it to see what the request should look like and constructing it accordingly, though that may just be the first step.

You may want to investigate the use of web automation tools such as selenium or WatiN instead.

like image 113
Oded Avatar answered Sep 23 '22 07:09

Oded


You need to observe the POST request via fiddler and see how it's structured. for instance :

    {"userName":"you","password":"pwd"}

Usually, a site would recognize that you are logged in by receiving their cookie in your requests.

HttpClient by default sends the cookies received from a specific domain with each sequential request to that domain (Until you dispose that HttpClient instance)

1) Create a cookie container and assigned it to your HttpClient instance.

2) Use HttpClient to make the login POST request.

3) Use HttpClient to make the data GET request.

4) Read the html string from the response.

5) Use HtmlAgilityPack HtmlDocument to load the document from the html string and not from the web (as most examples show).

 string baseUrl = "https://www.yourwebsite.com";
 string loginUrl = "/Account/LogOn"; 
 string sessionUrl = "/Data";

 var uri = new Uri(baseUrl);

 CookieContainer cookies = new CookieContainer();
 HttpClientHandler handler = new HttpClientHandler();
 handler.CookieContainer = cookies;

 using (var client = new HttpClient(handler))
 {
       client.BaseAddress = uri;

       var request = new { userName = "you", password = "pwd" };
       var resLogin = client.PostAsJsonAsync(loginUrl,request).Result;
       if (resLogin.StatusCode != HttpStatusCode.OK)
            Console.WriteLine("Could not login -> StatusCode = " + resLogin.StatusCode);

       // see what cookies are returned   
      IEnumerable<Cookie> responseCookies = cookies.GetCookies(uri).Cast<Cookie>();
      foreach (Cookie cookie in responseCookies)
            Console.WriteLine(cookie.Name + ": " + cookie.Value);

      var resData = client.GetAsync(dataUrl).Result;
      if(resSession.StatusCode != HttpStatusCode.OK)
            Console.WriteLine("Could not get data html -> StatusCode = " + resSession.StatusCode);

       var html = resSession.Content.ReadAsStringAsync().Result;

       var doc = new HtmlDocument();
       doc.LoadHtml(html);
 }
like image 38
eran otzap Avatar answered Sep 23 '22 07:09

eran otzap