Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove attributes using HtmlAgilityPack

Tags:

I'm trying to create a code snippet to remove all style attributes regardless of tag using HtmlAgilityPack.

Here's my code:

var elements = htmlDoc.DocumentNode.SelectNodes("//*");  if (elements!=null) {     foreach (var element in elements)     {         element.Attributes.Remove("style");     } } 

However, I'm not getting it to stick? If I look at the element object immediately after Remove("style"). I can see that the style attribute has been removed, but it still appears in the DocumentNode object. :/

I'm feeling a bit stupid, but it seems off to me? Anyone done this using HtmlAgilityPack? Thanks!

Update

I changed my code to the following, and it works properly:

public static void RemoveStyleAttributes(this HtmlDocument html) {    var elementsWithStyleAttribute = html.DocumentNode.SelectNodes("//@style");     if (elementsWithStyleAttribute!=null)    {       foreach (var element in elementsWithStyleAttribute)       {          element.Attributes["style"].Remove();       }    } } 
like image 664
Ted Nyberg Avatar asked May 01 '11 19:05

Ted Nyberg


1 Answers

Your code snippet seems to be correct - it removes the attributes. The thing is, DocumentNode .InnerHtml(I assume you monitored this property) is a complex property, maybe it get updated after some unknown circumstances and you actually shouldn't use this property to get the document as a string. Instead of it HtmlDocument.Save method for this:

string result = null; using (StringWriter writer = new StringWriter()) {     htmlDoc.Save(writer);     result = writer.ToString(); } 

now result variable holds the string representation of your document.

One more thing: your code may be improved by changing your expression to "//*[@style]" which gets you only elements with style attribute.

like image 156
Oleks Avatar answered Sep 17 '22 13:09

Oleks