I have a string of Html and it contains varied Html but includes this
<span style="display:block;position:fixed;width:100%;height:2000px;background-color:rgba(0,0,0,0);z-index:9999!important;top:0;left:0;cursor:default;"></span>
This will seem strange, but I only want to remove specific items within the style attribute (For all Html elements). For example I want to remove
position:fixed
and z-index:9999!important;
and top:0;
and left:0;
To name a few, but keep everything else. Now the issue is, it's not necessarily position:fixed;
it could be position:absolute;
or whatever. Just as it could be z-index:9998;
or top:20;
etc...
I need to be able to remove style elements by their key, so position:*anything*
and top:*anything*
etc.... AND also do this in a non-case sensitive manner. So it would get POSITION:*anything*
or PoSition:*anything*
Is there a way to achieve this using the Html Agility Pack?
There doesn't appear to be any support for inline style string parsing in the HTML Agility Pack, but .NET does have some capabilities for this in System.Web.UI
to support WebForms controls.
It's called the CssStyleCollection
, and it will convert your style
string into a nice array of string key/value pairs, and allow you to remove the specific keys you do not want.
However, since it's an internal tool for WebControl use, it doesn't have a public constructor. Instead, you have to instantiate it via reflection, or use a hack like this;
CssStyleCollection style = new Panel().Style;
Once created,
style.Value = "YOUR STYLE STRING";
And then remove the items you don't want;
style.Remove("position");
style.Remove("z-index");
style.Remove("top");
style.Remove("left");
Retrieve your new delimited style string from style.Value
.
IMPORTANT: I haven't tested this, but the process seems simple enough, if a bit hacky. There may be some surprises I haven't come across yet. In particular, I have no idea how it handles situations where there are multiple duplicate style settings in the same string;
top:0;margin-left:20;top:10;
In inline style strings, browsers will respect the last specified value, so top:10
wins. However since CssStyleCollection
uses unique keys, it cannot store both top
values and most likely discards one.
I think you'll just have to use HAP to grab the elements you want to clean up, grab the styles from the attribute and then loop over them to manually clean them.
I'd split on the ";" then the ":" to get name/value pairs. Loop over them, lowercase the name and throw it into a switch statement with fall throughs on them for ease and have a default that appends the name/value to a new string. Then inject the new string of styles back into your attribute.
// Psuedo code, not the real deal!!
// Inspired from http://htmlagilitypack.codeplex.com/wikipage?title=Examples
HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode span in doc.DocumentElement.SelectNodes("//span[@style]"))
{
HtmlAttribute att = span["style"];
att.Value = CleanStyles(att.Value);
}
doc.Save("file.htm");
// Elsewhere
public string CleanStyles( string oldStyles ) {
string newStyles = "";
foreach( var entries in oldStyle.Split( ';' ) ) {
var values = entries.Split(':');
switch( values[0].ToLower() ) {
case "position":
case "z-index":
// Do nothing, skip this value
break;
default:
newStyles += values.Join(':') + ";";
}
}
return newStyles;
}
Something like that anyway.
There is a very simple way of editing style attribute in HAP, as seen in the example here: https://html-agility-pack.net/knowledge-base/12062495/better-way-to-add-a-style-attribute-to-html-using-htmlagilitypack.
const string margin = "margin-top: 0";
foreach (var pTagNode in pTagNodes)
{
var styles = pTagNode.GetAttributeValue("style", null);
var separator = (styles == null ? null : "; ");
pTagNode.SetAttributeValue("style", styles + separator + margin);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With