Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strip Specific Styles from the Style attribute in Html string using Html Agility Pack

I have a string of Html and it contains varied Html but includes this

<span style="display:block;position:fixed;width:100%;height:2000px;background-color:rgba(0,0,0,0);z-index:9999!important;top:0;left:0;cursor:default;"></span>

This will seem strange, but I only want to remove specific items within the style attribute (For all Html elements). For example I want to remove

position:fixed and z-index:9999!important; and top:0; and left:0;

To name a few, but keep everything else. Now the issue is, it's not necessarily position:fixed; it could be position:absolute; or whatever. Just as it could be z-index:9998; or top:20; etc...

I need to be able to remove style elements by their key, so position:*anything* and top:*anything* etc.... AND also do this in a non-case sensitive manner. So it would get POSITION:*anything* or PoSition:*anything*

Is there a way to achieve this using the Html Agility Pack?

like image 616
YodasMyDad Avatar asked Jul 07 '15 11:07

YodasMyDad


3 Answers

There doesn't appear to be any support for inline style string parsing in the HTML Agility Pack, but .NET does have some capabilities for this in System.Web.UI to support WebForms controls.

It's called the CssStyleCollection, and it will convert your style string into a nice array of string key/value pairs, and allow you to remove the specific keys you do not want.

However, since it's an internal tool for WebControl use, it doesn't have a public constructor. Instead, you have to instantiate it via reflection, or use a hack like this;

CssStyleCollection style = new Panel().Style;

Once created,

style.Value = "YOUR STYLE STRING"; 

And then remove the items you don't want;

style.Remove("position");
style.Remove("z-index");
style.Remove("top");
style.Remove("left");

Retrieve your new delimited style string from style.Value.

IMPORTANT: I haven't tested this, but the process seems simple enough, if a bit hacky. There may be some surprises I haven't come across yet. In particular, I have no idea how it handles situations where there are multiple duplicate style settings in the same string;

top:0;margin-left:20;top:10; 

In inline style strings, browsers will respect the last specified value, so top:10 wins. However since CssStyleCollection uses unique keys, it cannot store both top values and most likely discards one.

like image 52
Memetican Avatar answered Nov 15 '22 03:11

Memetican


I think you'll just have to use HAP to grab the elements you want to clean up, grab the styles from the attribute and then loop over them to manually clean them.

I'd split on the ";" then the ":" to get name/value pairs. Loop over them, lowercase the name and throw it into a switch statement with fall throughs on them for ease and have a default that appends the name/value to a new string. Then inject the new string of styles back into your attribute.

 // Psuedo code, not the real deal!!
 // Inspired from http://htmlagilitypack.codeplex.com/wikipage?title=Examples
 HtmlDocument doc = new HtmlDocument(); 

 doc.Load("file.htm");
 foreach(HtmlNode span in doc.DocumentElement.SelectNodes("//span[@style]"))
 {
    HtmlAttribute att = span["style"];
    att.Value = CleanStyles(att.Value);
 }
 doc.Save("file.htm");

 // Elsewhere
 public string CleanStyles( string oldStyles ) {
    string newStyles = "";
    foreach( var entries in oldStyle.Split( ';' ) ) {
       var values = entries.Split(':');
       switch( values[0].ToLower() ) {
          case "position":
          case "z-index":
            // Do nothing, skip this value
            break;
          default:
             newStyles += values.Join(':') + ";";
       }
    }  
    return newStyles;
 }

Something like that anyway.

like image 40
Pete Duncanson Avatar answered Nov 15 '22 04:11

Pete Duncanson


There is a very simple way of editing style attribute in HAP, as seen in the example here: https://html-agility-pack.net/knowledge-base/12062495/better-way-to-add-a-style-attribute-to-html-using-htmlagilitypack.

const string margin = "margin-top: 0";
foreach (var pTagNode in pTagNodes)
{
    var styles = pTagNode.GetAttributeValue("style", null);
    var separator = (styles == null ? null : "; ");
    pTagNode.SetAttributeValue("style", styles + separator + margin);
}
like image 31
syonip Avatar answered Nov 15 '22 04:11

syonip