I am parsing and outputting an xml file using JSoup (and modifying the elements in between of course).
The output file has some extra spaces and line breaks. I was wondering if I can print this in the original format.
Original:
<attributes>
<divisions>4</divisions>
<key>
<fifths>0</fifths>
<mode>major</mode>
</key>
...
New:
<attributes>
<divisions>
4
</divisions>
<key>
<fifths>
0
</fifths>
<mode>
major
</mode>
</key>
...
Any idea on how to remove the spaces/enters from the elements?
I currently read in and print the document like this:
doc = Jsoup.parse(is, "UTF-8", "", Parser.xmlParser());
BufferedWriter htmlWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("output.xml"), "UTF-8"));
htmlWriter.write(doc.toString());
clean. Creates a new, clean document, from the original dirty document, containing only elements allowed by the safelist. The original document is not modified. Only elements from the dirty document's body are used.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.
With some help from Aleksandr M I solved it in the following way:
doc.outputSettings().indentAmount(0).prettyPrint(false);
A little less nice, but this also seemed to do the trick:
htmlWriter.write(doc.toString().replaceAll(">\\s+",">").replaceAll("\\s+<","<"));
Try this:
doc = Jsoup.parse(is, "UTF-8", "", Parser.xmlParser());
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
..
..
Hope this helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With