Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert a document made in Jsoup (the Java html parser) into a string

I have a document that was made in jsoup that looks like this

Document doc = Jsoup.connect("http://en.wikipedia.org/").get(); 

How do i convert that doc into a string.

like image 228
Hudson Hughes Avatar asked Jul 28 '11 20:07

Hudson Hughes


People also ask

What is jsoup document?

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

How do you process HTML in Java?

Its party trick is a CSS selector syntax to find elements, e.g.: String html = "<html><head><title>First parse</title></head>" + "<body><p>Parsed HTML into a doc. </p></body></html>"; Document doc = Jsoup. parse(html); Elements links = doc.


1 Answers

Have you tried:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get(); String htmlString = doc.toString(); 

As Document extends Element it also has got the method html() which "Retrieves the element's inner HTML" according to the API. So that should work:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get(); String htmlString = doc.html(); 

Additional Info:

Each Document object has got a reference to an instance of the inner class Document.OutputSettings which can be accessed via the method outputSettings() of Document. There you can enable/disable pretty-printing by using the setter prettyPrint(true/false). See the API for Document and Document.OutputSettings for furtherinformation

like image 134
das_weezul Avatar answered Oct 19 '22 22:10

das_weezul