In my application, I use a JTextPane
to display some log information. As I want to hightlight some specific lines in this text (for example the error messages), I set the contentType
as "text/html
". This way, I can format my text.
Now, I create a JButton that copies the content of this JTextPane
into the clipboard. That part is easy, but my problem is that when I call myTextPane.getText()
, I get the HTML code, such as :
<html>
<head>
</head>
<body>
blabla<br>
<font color="#FFCC66"><b>foobar</b></font><br>
blabla
</body>
</html>
instead of getting only the raw content:
blabla
foobar
blabla
Is there a way to get only the content of my JTextPane
in plain text? Or do I need to transform the HTML into raw text by myself?
No need to use the ParserCallback. Just use:
textPane.getDocument().getText(0, textPane.getDocument().getLength()) );
Based on the accepted answer to: Removing HTML from a Java String
MyHtml2Text parser = new MyHtml2Text();
try {
parser.parse(new StringReader(myTextPane.getText()));
} catch (IOException ee) {
//handle exception
}
System.out.println(parser.getText());
Slightly modified version of the Html2Text
class found on the answer I linked to
import java.io.IOException;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
public class MyHtml2Text extends HTMLEditorKit.ParserCallback {
StringBuffer s;
public MyHtml2Text() {}
public void parse(Reader in) throws IOException {
s = new StringBuffer();
ParserDelegator delegator = new ParserDelegator();
delegator.parse(in, this, Boolean.TRUE);
}
public void handleText(char[] text, int pos) {
s.append(text);
s.append("\n");
}
public String getText() {
return s.toString();
}
}
If you need a more fine-grained handling consider implementing more of the interface defined by HTMLEditorKit.ParserCallback
You need to do it yourself unfortunately. Imagine if some of the contents was HTML specific, eg images - the text representation is unclear. Include alt text or not for instance.
(Is RegExp allowed? This isn't parsing, isn't it)
Take the getText() result and use String.replaceAll() to filter all tags. Than a trim() to remove leading and trailing whitespaces. For the whitespaces between your first and you last 'blabla' I don't see a general solution. Maybe you can spilt the rest around CRLF and trim all Strings again.
(I'm no regexp expert - maybe someone can provide the regexp and earn some reputation ;) )
Edit
.. I just assumed that you don't use <
and >
in your text - otherwise it.. say, it's a challenge.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With