Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert plain text to HTML text in Java

Tags:

I have java program, which will receive plain text from server. The plain text may contain URLs. Is there any Class in Java library to convert plain text to HTML text? Or any other library? If there are not then what is the solution?

like image 623
Shaiful Avatar asked Feb 27 '11 18:02

Shaiful


2 Answers

You should do some replacements on the text programmatically. Here are some clues:

  • All Newlines should be converted to "<br>\n" (The \n for better readability of the output).
  • All CRs should be dropped (who uses DOS encoding anyway).
  • All pairs of spaces should be replaced with " &nbsp;"
  • Replace "<" with "&lt;"
  • Replace "&" with "&amp;"
  • All other characters < 128 should be left as they are.
  • All other characters >= 128 should be written as "&#"+((int)myChar)+";", to make them readable in every encoding.
  • To autodetect your links, you could either use a regex like "http://[^ ]+", or "www.[^ ]" and convert them like JB Nizet said. to "<a href=\""+url+"\">"+url+"</a>", but only after having done all the other replacements.

The code to do this looks something like this:

public static String escape(String s) {     StringBuilder builder = new StringBuilder();     boolean previousWasASpace = false;     for( char c : s.toCharArray() ) {         if( c == ' ' ) {             if( previousWasASpace ) {                 builder.append("&nbsp;");                 previousWasASpace = false;                 continue;             }             previousWasASpace = true;         } else {             previousWasASpace = false;         }         switch(c) {             case '<': builder.append("&lt;"); break;             case '>': builder.append("&gt;"); break;             case '&': builder.append("&amp;"); break;             case '"': builder.append("&quot;"); break;             case '\n': builder.append("<br>"); break;             // We need Tab support here, because we print StackTraces as HTML             case '\t': builder.append("&nbsp; &nbsp; &nbsp;"); break;               default:                 if( c < 128 ) {                     builder.append(c);                 } else {                     builder.append("&#").append((int)c).append(";");                 }             }     }     return builder.toString(); } 

However, the link conversion has yet to be added. If someone does it, please update the code.

like image 71
Daniel Avatar answered Sep 22 '22 18:09

Daniel


I found a solution using pattern matching. Here is my code -

String str = "(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:\'\".,<>?«»“”‘’]))"; Pattern patt = Pattern.compile(str); Matcher matcher = patt.matcher(plain); plain = matcher.replaceAll("<a href=\"$1\">$1</a>"); 

And Here are the input and output -

Input text is variable plain:

some text and then the URL http://www.google.com and then some other text. 

Output :

some text and then the URL <a href="http://www.google.com">http://www.google.com</a> and then some other text. 
like image 27
Shaiful Avatar answered Sep 18 '22 18:09

Shaiful