Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to convert HTML text to plain text? [duplicate]

Tags:

java

html

friend's I have to parse the description from url,where parsed content have few html tags,so how can I convert it to plain text.

like image 905
MGSenthil Avatar asked Aug 31 '10 10:08

MGSenthil


People also ask

How do I display HTML as plain text?

You can show HTML tags as plain text in HTML on a website or webpage by replacing < with &lt; or &60; and > with &gt; or &62; on each HTML tag that you want to be visible. Ordinarily, HTML tags are not visible to the reader on the browser.

How do I convert HTML text to normal text in Java?

Just call the method html2text with passing the html text and it will return plain text.


2 Answers

Yes, Jsoup will be the better option. Just do like below to convert the whole HTML text to plain text.

String plainText= Jsoup.parse(yout_html_text).text(); 
like image 80
Ranjit Avatar answered Sep 23 '22 08:09

Ranjit


Just getting rid of HTML tags is simple:

// replace all occurrences of one or more HTML tags with optional // whitespace inbetween with a single space character  String strippedText = htmlText.replaceAll("(?s)<[^>]*>(\\s*<[^>]*>)*", " "); 

But unfortunately the requirements are never that simple:

Usually, <p> and <div> elements need a separate handling, there may be cdata blocks with > characters (e.g. javascript) that mess up the regex etc.

like image 36
Sean Patrick Floyd Avatar answered Sep 24 '22 08:09

Sean Patrick Floyd