Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get plain text from HTML in .NET

Tags:

html

string

.net

What is the best way to get a plain text string from an HTML string?

public string GetPlainText(string htmlString) {     // any .NET built in utility? } 

Thanks in advance

like image 928
Daniel Peñalba Avatar asked May 03 '11 13:05

Daniel Peñalba


People also ask

How do you convert text to normal text in HTML in Python?

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format). Escape all special characters. Output is less readable, but avoids corner case formatting issues.


1 Answers

You can use MSHTML, which can be pretty forgiving;

//using microsoft.mshtml HTMLDocument htmldoc = new HTMLDocument(); IHTMLDocument2 htmldoc2 = (IHTMLDocument2)htmldoc; htmldoc2.write(new object[] { "<p>Plateau <i>of<i> <b>Leng</b><hr /><b erp=\"arp\">2 sugars please</b> <xxx>what? &amp; who?" });  string txt = htmldoc2.body.outerText; 

Plateau of Leng 2 sugars please what? & who?

like image 164
Alex K. Avatar answered Sep 24 '22 19:09

Alex K.