I have a html document and I want to delete all the divs of certain class (with all the content). What is the simplest way to do it?
Thank you for your help.
UPDATED:
I tried out Html Agility Pack as you adviced, but I failed to reach the aim. I have the following code
static void Main()
{
HtmlDocument document = new HtmlDocument();
document.Load(FileName);
HtmlNode node = document.DocumentNode;
HandleNode(node);
}
private static void HandleNode(HtmlNode node)
{
while (node != null)
{
if (node.Name == "div")
{
var attribute = node.Attributes.Where(x => x.Name == "class" && x.Value == "NavContent");
if (attribute.Any())
node.Remove();
}
foreach (var childNode in node.ChildNodes)
{
HandleNode(childNode);
}
}
}
But it doesn't do want I want. The recursion never ends and the node name is always comment.
Here's the htmp-document I'm trying to parse: http://en.wiktionary.org/wiki/work
Is there a good example how to work with Html Agility Pack?
What's wrong with this piece of code?
It depends on how complex your HTML is, but you will probably need the Agility Pack library.
HandleNode() contains a while(node != null)
loop but never assigns to node. I would change that to an if(...)
to start with.
You're looking for the HTML Agility Pack.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With