Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strip HTML and CSS in C#

Tags:

html

string

c#

I'm creating mails in one of my solutions and need to provide both html and plaintext mails from a given html page.

However, I haven't found any real good way to strip html, js and css from whatever html template the customers might provide.

Are there any simple solution to this, perhaps a component that handle all this or do I need to start puzzle with regexp? And is it even possible to create a bulletproof regexp for all possible tags?

Regards

like image 285
elwis Avatar asked Dec 22 '22 16:12

elwis


2 Answers

Give HtmlAgilityPack a go. It has methods for extracting the text out of an HTML Document.

You basically just need to do the following:

  var doc = new HtmlDocument();
  doc.LoadHtml(htmlStr);
  var node = doc.DocumentNode;
  var textContent = node.InnerText;
like image 197
paracycle Avatar answered Dec 29 '22 01:12

paracycle


As a component that can strip html: Html Agility Pack

like image 33
wassertim Avatar answered Dec 29 '22 00:12

wassertim