I'll start from the end:
In my C# program, I have a string containing HTML, and I'd like to remove from the elements in this string, all inline style attributes (style=".."
), and all classes beginning with 'abc'.
I'm willing to use regular expressions for this, even though some people bitch about it :).
(an explanation, for those wishing to berate me for parsing HTML strings:
I'm forced to use some less-than-optimal web control for my project. the control is designed to be used server-side (i.e with postbacks and all that stuff), but I'm required to use it in ajax calls.
which means that I have to configure it in code, call its Render()
method which gives me the HTML string, and pass that string to the client-side, where it's inserted into the DOM at the appropriate place. Unfortunately, I wasn't able to find the correct configuration of the control to stop it from rendering itself with these useless styles and classes, so I'm forced to remove them by hand. Please don't hate me.)
Try this:
string html;
string cleaned = new Regex("style=\"[^\"]*\"").Replace(html, "");
string cleaned = new Regex("(?<=class=\")([^\"]*)\\babc\\w*\\b([^\"]*)(?=\")").Replace(cleaned, "$1$2");
To anyone interested- I've solved this without using RegEx;
Rather, I used XDocument
to parse the html-
private string MakeHtmlGood(string html)
{
var xmlDoc = XDocument.Parse(html);
// Remove all inline styles
xmlDoc.Descendants().Attributes("style").Remove();
// Remove all classes inserted by 3rd party, without removing our own lovely classes
foreach (var node in xmlDoc.Descendants())
{
var classAttribute = node.Attributes("class").SingleOrDefault();
if (classAttribute == null)
{
continue;
}
var classesThatShouldStay = classAttribute.Value.Split(' ').Where(className => !className.StartsWith("abc"));
classAttribute.SetValue(string.Join(" ", classesThatShouldStay));
}
return xmlDoc.ToString();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With