Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Output JSoup without added spaces and line breaks around the elements

I am parsing and outputting an xml file using JSoup (and modifying the elements in between of course).

The output file has some extra spaces and line breaks. I was wondering if I can print this in the original format.

Original:

  <attributes>
        <divisions>4</divisions>
        <key>
          <fifths>0</fifths>
          <mode>major</mode>
          </key>
...

New:

<attributes> 
    <divisions>
     4
    </divisions> 
    <key> 
     <fifths>
      0
     </fifths> 
     <mode>
      major
     </mode> 
    </key> 
...

Any idea on how to remove the spaces/enters from the elements?

I currently read in and print the document like this:

doc = Jsoup.parse(is, "UTF-8", "", Parser.xmlParser());


BufferedWriter htmlWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("output.xml"), "UTF-8"));
        htmlWriter.write(doc.toString());
like image 485
dorien Avatar asked Mar 05 '15 11:03

dorien


People also ask

What does jsoup clean do?

clean. Creates a new, clean document, from the original dirty document, containing only elements allowed by the safelist. The original document is not modified. Only elements from the dirty document's body are used.

What is jsoup parse?

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.


2 Answers

With some help from Aleksandr M I solved it in the following way:

doc.outputSettings().indentAmount(0).prettyPrint(false);

A little less nice, but this also seemed to do the trick:

htmlWriter.write(doc.toString().replaceAll(">\\s+",">").replaceAll("\\s+<","<"));
like image 137
dorien Avatar answered Oct 02 '22 11:10

dorien


Try this:

doc = Jsoup.parse(is, "UTF-8", "", Parser.xmlParser());
doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml);
..
..

Hope this helps

like image 36
web-nomad Avatar answered Oct 01 '22 11:10

web-nomad