I'm creating mails in one of my solutions and need to provide both html and plaintext mails from a given html page.
However, I haven't found any real good way to strip html, js and css from whatever html template the customers might provide.
Are there any simple solution to this, perhaps a component that handle all this or do I need to start puzzle with regexp? And is it even possible to create a bulletproof regexp for all possible tags?
Regards
Give HtmlAgilityPack a go. It has methods for extracting the text out of an HTML Document.
You basically just need to do the following:
  var doc = new HtmlDocument();
  doc.LoadHtml(htmlStr);
  var node = doc.DocumentNode;
  var textContent = node.InnerText;
                        As a component that can strip html: Html Agility Pack
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With