I'm creating mails in one of my solutions and need to provide both html and plaintext mails from a given html page.
However, I haven't found any real good way to strip html, js and css from whatever html template the customers might provide.
Are there any simple solution to this, perhaps a component that handle all this or do I need to start puzzle with regexp? And is it even possible to create a bulletproof regexp for all possible tags?
Regards
Give HtmlAgilityPack a go. It has methods for extracting the text out of an HTML Document.
You basically just need to do the following:
var doc = new HtmlDocument();
doc.LoadHtml(htmlStr);
var node = doc.DocumentNode;
var textContent = node.InnerText;
As a component that can strip html: Html Agility Pack
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With