I need to remove all style tags completely for the given HTML code. I found following regex to match entire style tag in the the XML. It works fine for the given Html code in online regex testers.
*style\s*=\s*('|")[^\2]*?\2([^>]*)*
However, through a C# code, it didn't work for the given HTML.
Following is the C# code:
Regex regex = new Regex("style\\s*=\\s*('|\")[^\\2]*?\\2([^>]*)", RegexOptions.IgnoreCase);
Regex should be
style\s*=\s*('|")[^\1]*\1
Though I would use Htmlagilitypack
HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);
var elementsWithStyleAttribute = doc.DocumentNode.SelectNodes("//@style");
foreach (var element in elementsWithStyleAttribute)
{
element.Attributes["style"].Remove();
}
doc.Save();
I usually use the below code to remove inline styles, class, images and comments from an Outlook message prior to saving it into database:
desc = Regex.Replace(desc, "(<style.+?</style>)|(<script.+?</script>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
desc = Regex.Replace(desc, "(<img.+?>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
desc = Regex.Replace(desc, "(<o:.+?</o:.+?>)", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
desc = Regex.Replace(desc, "<!--.+?-->", "", RegexOptions.IgnoreCase | RegexOptions.Singleline);
desc = Regex.Replace(desc, "class=.+?>", ">", RegexOptions.IgnoreCase | RegexOptions.Singleline);
desc = Regex.Replace(desc, "class=.+?\s", " ", RegexOptions.IgnoreCase | RegexOptions.Singleline);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With