Let's say I have this HTML :
<html>
<head>
</head>
<body>
<form method="post">
<select name="books">
<option value="111">111</option>
<option value="222">222</option>
</select>
</form>
</body>
</html>
I load it in Jsoup and get the result back :
Document doc = Jsoup.parse(html);
doc.outputSettings().indentAmount(4);
doc.outputSettings().charset("UTF-8");
doc.outputSettings().prettyPrint(true);
String result = doc.outerHtml();
This result is :
<html>
<head>
</head>
<body>
<form method="post">
<select name="books"> <option value="111">111</option> <option value="222">222</option> </select>
</form>
</body>
</html>
The <option>
elements are all on the same line!
How can I have Jsoup to format the <option>
elements so the result is the same than the input, in this example?
doc.outputSettings().charset("UTF-8");
When parsing just html from a string, the default charset is UTF-8, unless you otherwise set the charset using File
or InputStream
as your parse input.
Therefore, the charset on OutputSettings
will default to the same as input, which is UTF-8, in your case. You only need to set this if you want it to be different from the input.
Document.OutputSettings.charset()
Get the document's current output charset, which is used to control which characters are escaped when generating HTML (via the html() methods), and which are kept intact.
Where possible (when parsing from a URL or File), the document's output charset is automatically set to the input charset. Otherwise, it defaults to UTF-8.
doc.outputSettings().prettyPrint(true);
You don't need to enable pretty print, it is on by default.
Document.OutputSettings.prettyPrint()
Get if pretty printing is enabled. Default is true. If disabled, the HTML output methods will not re-format the output, and the output will generally look like the input.
doc.outputSettings().outline(true);
This is the key tag. When this is not set, only block tags are displayed as such (option
is not a block tag). When it is enabled, all tags are considered block elements.
Document.OutputSettings.outline()
Get if outline mode is enabled. Default is false. If enabled, the HTML output methods will consider all tags as block.
So your final block of code should look something like this:
Document doc = Jsoup.parse(html);
doc.outputSettings().indentAmount(4).outline(true);
String result = doc.outerHtml();
Output
<html>
<head>
</head>
<body>
<form method="post">
<select name="books">
<option value="111">111</option>
<option value="222">222</option>
</select>
</form>
</body>
</html>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With