I'm trying to create a code snippet to remove all style
attributes regardless of tag using HtmlAgilityPack.
Here's my code:
var elements = htmlDoc.DocumentNode.SelectNodes("//*"); if (elements!=null) { foreach (var element in elements) { element.Attributes.Remove("style"); } }
However, I'm not getting it to stick? If I look at the element
object immediately after Remove("style")
. I can see that the style attribute has been removed, but it still appears in the DocumentNode
object. :/
I'm feeling a bit stupid, but it seems off to me? Anyone done this using HtmlAgilityPack? Thanks!
Update
I changed my code to the following, and it works properly:
public static void RemoveStyleAttributes(this HtmlDocument html) { var elementsWithStyleAttribute = html.DocumentNode.SelectNodes("//@style"); if (elementsWithStyleAttribute!=null) { foreach (var element in elementsWithStyleAttribute) { element.Attributes["style"].Remove(); } } }
Your code snippet seems to be correct - it removes the attributes. The thing is, DocumentNode .InnerHtml
(I assume you monitored this property) is a complex property, maybe it get updated after some unknown circumstances and you actually shouldn't use this property to get the document as a string. Instead of it HtmlDocument.Save
method for this:
string result = null; using (StringWriter writer = new StringWriter()) { htmlDoc.Save(writer); result = writer.ToString(); }
now result
variable holds the string representation of your document.
One more thing: your code may be improved by changing your expression to "//*[@style]"
which gets you only elements with style
attribute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With