Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scraping a website to get the element name and id through C# web browser

Tags:

html

c#

I am trying to scrape a website to get the Textarea information.

I'm using:

HtmlDocument doc = this.webBrowser1.Document;

When I look at the view source it shows <textarea name="message" class="profile">

But when I try to access this textarea with:

 HtmlDocument doc = this.webBrowser1.Document;

 doc.GetElementsByTagName("textarea")
      .GetElementsByName("message")[0]
      .SetAttribute("value", "Hello");

It shows the error:

 Value of '0' is not valid for 'index'. 'index' should be between 0 and -1.
Parameter name: index

Any Help?

like image 628
IceDawg Avatar asked Oct 22 '22 11:10

IceDawg


2 Answers

For your current need you can simply use this:

doc.GetElementsByTagName("textarea")[0].InnerText = "Hello";

For complex things you can use HtmlDocument class with MSHTML class.

like image 181
Santosh Panda Avatar answered Oct 24 '22 03:10

Santosh Panda


I can entrust HtmlAgilityPack to you!

I'd like to think that you try to access a website that uses cookies to determine if a user is logged in (or not). If not, it will force you to register/log-in else you aren't allowed to see anything. Am I right?

Your browser stores that cookies, your C# does not! (broadly speaking)
You need to create a cookie container to solve that problem.

Your C#-App may log-in, request a cookie/session, may grab the Cookies from the responseheader and then you should be able to scrape the profiles or whatever you want.
Get the Post Data, which is send to server. You can use tools/addons like Fiddler, Tamper, ect..

E.g. PostdataString: user_name=TESTUSER&password=TESTPASSWORD&language=en&action%3Asubmit=Submit

Here is a snippet you can use.

        //Create the PostData
        string strPostData = "user_name=" + txtUser.Text + "&password=" + txtPass.Text + "&language=en&action%3Asubmit=Submit";
        CookieContainer tempCookies = new CookieContainer();
        ASCIIEncoding encoding = new ASCIIEncoding();
        byte[] data = encoding.GetBytes(strPostData);

        //Create the Cookie
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.website.com/login.php");
        request.Method = "POST";
        request.KeepAlive = true;
        request.AllowAutoRedirect = false;
        request.Accept = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        request.ContentType = "application/x-www-form-urlencoded";
        request.Referer = "http://www.website.com/login.php";
        request.UserAgent = "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1";
        request.ContentLength = data.Length;

        Stream requestStream = request.GetRequestStream();
        requestStream.Write(data, 0, data.Length);

        HttpWebResponse response;
        response = (HttpWebResponse)request.GetResponse();
        string sRequestHeaderBuffer = Convert.ToString(response.Headers);

        requestStream.Close();

        //Stream(-output) of the new website
        StreamReader postReqReader = new StreamReader(response.GetResponseStream());

        //RichTextBox to see the new source.
        richTextBox1.Text = postReqReader.ReadToEnd();

You will need to adjust the Cookie-parameters in between and add your current sessionid aswell to the code. This depends on the requested website you visit.
E.g.:

        request.Headers.Add("Cookie", "language=en_US.UTF-8; StationID=" + sStationID + "; SessionID=" + sSessionID);
like image 45
MrMAG Avatar answered Oct 24 '22 04:10

MrMAG