Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get all html as a String from HTMLDocument

Tags:

java

document

Im coding in Java..

Does anyone know how i can get the content of a javax.swing.text.html.HTMLDocument as a String? This is what i´ve got so far...

URL url = new URL( "http://www.test.com" );

HTMLEditorKit kit = new HTMLEditorKit(); 
HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument(); 
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
Reader HTMLReader = new InputStreamReader(url.openConnection().getInputStream()); 
kit.read(HTMLReader, doc, 0); 

I need the content of the HTMLDocument as a String.

Example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">    <html><head><meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">

....... etc.

Any help would be appreciated. I need to use HTMLDocument class in order for the html to be processed correctly :)

Thanks Daniel

like image 632
Zelleriation Avatar asked May 06 '12 16:05

Zelleriation


People also ask

How do you pass a html code into a string?

click(function(){ var text = $("#tab"). html(); // taking the content var res = text.

How do you parse html?

If you just want to parse HTML and your HTML is intended for the body of your document, you could do the following : (1) var div=document. createElement("DIV"); (2) div. innerHTML = markup; (3) result = div. childNodes; --- This gives you a collection of childnodes and should work not just in IE8 but even in IE6-7.

What is parseHTML in JavaScript?

parseHTML uses native methods to convert the string to a set of DOM nodes, which can then be inserted into the document. These methods do render all trailing or leading text (even if that's just whitespace).


2 Answers

StringWriter writer = new StringWriter();
kit.write(writer, doc, 0, doc.getLength());
String s = writer.toString();
like image 143
Joop Eggen Avatar answered Oct 01 '22 19:10

Joop Eggen


You don't need the editor and reader at all - just read the input stream. For example, with commons-io IOUtils.toString(inputStream)

or you can use:

Content content = document.getContent();
String str = content.getString(0, content.length() - 1);
like image 21
Bozho Avatar answered Oct 01 '22 20:10

Bozho