Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why HTML Agility Pack HtmlDocument.DocumentNode is null?

I'm using this code to change the href attribute of a HTML stream.

first I download a full html page using this code:(URL is webpage address)

HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse myHttpWebResponse = 
                         (HttpWebResponse)myHttpWebRequest.GetResponse();

Stream s = myHttpWebResponse.GetResponseStream();

then I process this:

HtmlDocument doc = new HtmlDocument();

doc.Load(s);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a"))
{
    string att = link.Attributes["href"].Value;
    link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;
}
doc.Save(s);

s is html stream.

but I've got an exception that says doc.DocumentNode is null!

i tried many sites but doc.DocumentNode is null to

like image 603
ahmadali shafiee Avatar asked Feb 04 '12 07:02

ahmadali shafiee


People also ask

Is HTML agility pack free?

Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a . NET code library that allows you to parse "out of the web" HTML files.

What is HTML agility pack?

For users who are unafamiliar with “HTML Agility Pack“, this is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT. In simple words, it is a . NET code library that allows you to parse “out of the web” files (be it HTML, PHP or aspx).


1 Answers

This works for me.

using(WebClient client = new WebClient())
{
    client.Encoding = System.Text.Encoding.UTF8;
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(client.DownloadString("http://www.google.com?q=stackoverflow"));
    foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
    {
        if (href == null) continue;
        href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
    }
    StringWriter writer = new StringWriter();
    doc.Save(writer);
    var finalHtml = writer.ToString();
}

Also see the HttpUtility.UrlEncode to be able to get the url back correctly. Otherwise, some parameters in original url may cause problem.

Use HttpUtility.UrlDecode to decode it.

like image 105
L.B Avatar answered Sep 22 '22 23:09

L.B