Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

remove all inline styles and (most) classes from an HTML string

I'll start from the end:
In my C# program, I have a string containing HTML, and I'd like to remove from the elements in this string, all inline style attributes (style=".."), and all classes beginning with 'abc'.
I'm willing to use regular expressions for this, even though some people bitch about it :).

(an explanation, for those wishing to berate me for parsing HTML strings:
I'm forced to use some less-than-optimal web control for my project. the control is designed to be used server-side (i.e with postbacks and all that stuff), but I'm required to use it in ajax calls.
which means that I have to configure it in code, call its Render() method which gives me the HTML string, and pass that string to the client-side, where it's inserted into the DOM at the appropriate place. Unfortunately, I wasn't able to find the correct configuration of the control to stop it from rendering itself with these useless styles and classes, so I'm forced to remove them by hand. Please don't hate me.)

like image 968
J. Ed Avatar asked Dec 02 '22 17:12

J. Ed


2 Answers

Try this:

string html;
string cleaned = new Regex("style=\"[^\"]*\"").Replace(html, "");
string cleaned = new Regex("(?<=class=\")([^\"]*)\\babc\\w*\\b([^\"]*)(?=\")").Replace(cleaned, "$1$2");
like image 51
Bohemian Avatar answered Dec 04 '22 07:12

Bohemian


To anyone interested- I've solved this without using RegEx;
Rather, I used XDocument to parse the html-

private string MakeHtmlGood(string html)
        {
            var xmlDoc = XDocument.Parse(html);
            // Remove all inline styles
            xmlDoc.Descendants().Attributes("style").Remove();

            // Remove all classes inserted by 3rd party, without removing our own lovely classes
            foreach (var node in xmlDoc.Descendants())
            {
                var classAttribute = node.Attributes("class").SingleOrDefault();
                if (classAttribute == null)
                {
                    continue;
                }
                var classesThatShouldStay = classAttribute.Value.Split(' ').Where(className => !className.StartsWith("abc"));
                classAttribute.SetValue(string.Join(" ", classesThatShouldStay));

            }

            return xmlDoc.ToString();
        }
like image 34
J. Ed Avatar answered Dec 04 '22 06:12

J. Ed