Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JSoup - Formatting the <option> elements

Let's say I have this HTML :

<html>
    <head>
    </head>
    <body>
        <form method="post">
            <select name="books"> 
                <option value="111">111</option>
                <option value="222">222</option>
            </select>
        </form>
    </body>
</html>

I load it in Jsoup and get the result back :

Document doc = Jsoup.parse(html);
doc.outputSettings().indentAmount(4);
doc.outputSettings().charset("UTF-8");
doc.outputSettings().prettyPrint(true);
String result = doc.outerHtml();

This result is :

<html>
    <head> 
    </head> 
    <body> 
        <form method="post"> 
            <select name="books"> <option value="111">111</option> <option value="222">222</option> </select> 
        </form>  
    </body>
</html>

The <option> elements are all on the same line!

How can I have Jsoup to format the <option> elements so the result is the same than the input, in this example?

like image 883
electrotype Avatar asked Mar 12 '23 11:03

electrotype


1 Answers

doc.outputSettings().charset("UTF-8");

When parsing just html from a string, the default charset is UTF-8, unless you otherwise set the charset using File or InputStream as your parse input.

Therefore, the charset on OutputSettings will default to the same as input, which is UTF-8, in your case. You only need to set this if you want it to be different from the input.

Document.OutputSettings.charset()

Get the document's current output charset, which is used to control which characters are escaped when generating HTML (via the html() methods), and which are kept intact.

Where possible (when parsing from a URL or File), the document's output charset is automatically set to the input charset. Otherwise, it defaults to UTF-8.


doc.outputSettings().prettyPrint(true);

You don't need to enable pretty print, it is on by default.

Document.OutputSettings.prettyPrint()

Get if pretty printing is enabled. Default is true. If disabled, the HTML output methods will not re-format the output, and the output will generally look like the input.


doc.outputSettings().outline(true);

This is the key tag. When this is not set, only block tags are displayed as such (option is not a block tag). When it is enabled, all tags are considered block elements.

Document.OutputSettings.outline()

Get if outline mode is enabled. Default is false. If enabled, the HTML output methods will consider all tags as block.


So your final block of code should look something like this:

Document doc = Jsoup.parse(html);

doc.outputSettings().indentAmount(4).outline(true);

String result = doc.outerHtml();

Output

<html>
    <head> 
    </head> 
    <body> 
        <form method="post"> 
            <select name="books"> 
                <option value="111">111</option> 
                <option value="222">222</option> 
            </select> 
        </form>  
    </body>
</html>
like image 107
Zack Avatar answered Mar 29 '23 22:03

Zack