get all html as a String from HTMLDocument

Tags:

java

document

Im coding in Java..

Does anyone know how i can get the content of a javax.swing.text.html.HTMLDocument as a String? This is what i´ve got so far...

URL url = new URL( "http://www.test.com" );

HTMLEditorKit kit = new HTMLEditorKit(); 
HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument(); 
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
Reader HTMLReader = new InputStreamReader(url.openConnection().getInputStream()); 
kit.read(HTMLReader, doc, 0);

I need the content of the HTMLDocument as a String.

Example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">    <html><head><meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">

....... etc.

Any help would be appreciated. I need to use HTMLDocument class in order for the html to be processed correctly :)

Thanks Daniel

632

asked May 06 '12 16:05

Zelleriation

2 Answers

StringWriter writer = new StringWriter();
kit.write(writer, doc, 0, doc.getLength());
String s = writer.toString();

143

answered Oct 01 '22 19:10

Joop Eggen

You don't need the editor and reader at all - just read the input stream. For example, with commons-io IOUtils.toString(inputStream)

or you can use:

Content content = document.getContent();
String str = content.getString(0, content.length() - 1);

answered Oct 01 '22 20:10

Bozho

Related questions
                            
                                Java, is it possible to 'convert' object from subclass to object from superclass
                            
                                Mixing Android Views and GLSurfaceView
                            
                                Can RestEasy serialize a POJO for a @GET method?
                            
                                What is the best practice for unlocking app features to the user in Android?
                            
                                What is causing this Java "Cannot find symbol" error?
                            
                                Don't packages have to match the subdirectories the java file is in?
                            
                                GLSL #version gives syntax error (LWJGL on Mac)
                            
                                unlock protected pdf files
                            
                                What is arithmetic left shift of 01001001?
                            
                                Abstract class or interface. Which way is correct?
                            
                                JAXB Fragmented Marshalling
                            
                                java decompilation
                            
                                Deploy War File in Microsoft IIS 7
                            
                                Issue with Google Places API - INVALID REQUEST
                            
                                who and when notify the thread.wait() when thread.join() is called?
                            
                                Importing a CSV File into Java
                            
                                Jsoup - extracting text
                            
                                Interpreting bytecode vs compiling bytecode?
                            
                                Where to get the JavaFX sources?
                            
                                I want to replace a entire String with new value using String replaceAll method in Java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With