Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing element by class name with HTMLAgilityPack c#

I'm using the html agility pack to read the contents of my html document into a string etc. After this is done, I would like to remove certian elements in that content by their class, however I am stumbling upon a problem.

My Html looks like this:

<div id="wrapper">
    <div class="maincolumn" >
        <div class="breadCrumbContainer">
            <div class="breadCrumbs">
            </div>
        </div>

        <div class="seo_list">
            <div class="seo_head">Header</div>
        </div>

Content goes here...
</div>

Now, I have used an xpath selector to get all the content within the and used the InnerHtml property like so:

            node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
            if (node != null)
            {
                pageContent = node.InnerHtml;
            }

From this point, I would like to remove the div with the class of "breadCrumbContainer", however when using the code below, I get the error: "Node "" was not found in the collection"

            node = doc.DocumentNode.SelectSingleNode("//div[@id='wrapper']");
            node = node.RemoveChild(node.SelectSingleNode("//div[@class='breadCrumbContainer']"));

            if (node != null)
            {
                pageContent = node.InnerHtml;
            }

Can anyone shed some light on this please? I'm quite new to Xpath, and really new to the HtmlAgility library.

Thanks,

Dave

like image 534
Dave Avatar asked Mar 07 '11 10:03

Dave


1 Answers

It's because RemoveChild can only remove a direct child, not a grand child. Try this instead:

    HtmlNode node = doc.DocumentNode.SelectSingleNode("//div[@class='breadCrumbContainer']");
    node.ParentNode.RemoveChild(node);
like image 156
Simon Mourier Avatar answered Sep 23 '22 07:09

Simon Mourier