Parsing HTML String [duplicate]

Question

Is there a way to parse HTML string in .Net code behind like DOM parsing...

i.e. GetElementByTagName("abc").GetElementByTagName("tag")

I've this code chunk...

private void LoadProfilePage()
{        
    string sURL;
    sURL = "http://www.abcd1234.com/abcd1234";

    WebRequest wrGETURL;
    wrGETURL = WebRequest.Create(sURL);

    //WebProxy myProxy = new WebProxy("myproxy",80);
    //myProxy.BypassProxyOnLocal = true;

    //wrGETURL.Proxy = WebProxy.GetDefaultProxy();

    Stream objStream;
    objStream = wrGETURL.GetResponse().GetResponseStream();

    if (objStream != null)
    {
        StreamReader objReader = new StreamReader(objStream);

        string sLine = objReader.ReadToEnd();

        if (String.IsNullOrEmpty(sLine) == false)
        {
            ....                   
        }
    }
}

Oded · Accepted Answer

You can use the excellent HTML Agility Pack.

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Mark Coleman · Answer

Take a look at using the Html Agility Pack

Example of its use:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm");
 foreach(HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]")
 {
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
 }

Parsing HTML String [duplicate]

Tags:

html

c#

.net

parsing

S M Kamran

2 Answers

Oded

Mark Coleman

Recent Activity

Donate For Us

Parsing HTML String [duplicate]

Tags:

html

c#

.net

parsing

S M Kamran

2 Answers

Oded

Mark Coleman

Related questions

Recent Activity

Donate For Us