In short; i have many empty lines generated in an XML file, and i am looking for a way to remove them as a way of leaning the file. How can i do that ? For detailed explanation; I currently have this XML file : <pre class="prettyprint"><code><recent> <paths> <path>path1</path> <path>path2</path> <path>path3</path> <path>path4</path> </paths> </recent> </code></pre> And i use this Java code to delete all tags, and add new ones instead : <pre class="prettyprint"><code>public void savePaths( String recentFilePath ) { ArrayList<String> newPaths = getNewRecentPaths(); Document recentDomObject = getXMLFile( recentFilePath ); // Get the <recent> element. NodeList pathNodes = recentDomObject.getElementsByTagName( "path" ); // Get all <path> nodes. //1. Remove all old path nodes : for ( int i = pathNodes.getLength() - 1; i >= 0; i-- ) { Element pathNode = (Element)pathNodes.item( i ); pathNode.getParentNode().removeChild( pathNode ); } //2. Save all new paths : Element pathsElement = (Element)recentDomObject.getElementsByTagName( "paths" ).item( 0 ); // Get the first <paths> node. for( String newPath: newPaths ) { Element newPathElement = recentDomObject.createElement( "path" ); newPathElement.setTextContent( newPath ); pathsElement.appendChild( newPathElement ); } //3. Save the XML changes : saveXMLFile( recentFilePath, recentDomObject ); } </code></pre> After executing this method a number of times i get an XML file with right results, but with many empty lines after the "paths" tag and before the first "path" tag, like this : <pre class="prettyprint"><code><recent> <paths> <path>path5</path> <path>path6</path> <path>path7</path> </paths> </recent> </code></pre> Anyone knows how to fix that ? ------------------------------------------- Edit: Add the getXMLFile(...), saveXMLFile(...) code. <pre class="prettyprint"><code>public Document getXMLFile( String filePath ) { File xmlFile = new File( filePath ); try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document domObject = db.parse( xmlFile ); domObject.getDocumentElement().normalize(); return domObject; } catch (Exception e) { e.printStackTrace(); } return null; } public void saveXMLFile( String filePath, Document domObject ) { File xmlOutputFile = null; FileOutputStream fos = null; try { xmlOutputFile = new File( filePath ); fos = new FileOutputStream( xmlOutputFile ); TransformerFactory transformerFactory = TransformerFactory.newInstance(); Transformer transformer = transformerFactory.newTransformer(); transformer.setOutputProperty( OutputKeys.INDENT, "yes" ); transformer.setOutputProperty( "{http://xml.apache.org/xslt}indent-amount", "2" ); DOMSource xmlSource = new DOMSource( domObject ); StreamResult xmlResult = new StreamResult( fos ); transformer.transform( xmlSource, xmlResult ); // Save the XML file. } catch (FileNotFoundException e) { e.printStackTrace(); } catch (TransformerConfigurationException e) { e.printStackTrace(); } catch (TransformerException e) { e.printStackTrace(); } finally { if (fos != null) try { fos.close(); } catch (IOException e) { e.printStackTrace(); } } } </code></pre>

First, an explanation of why this happens — which might be a bit off since you didn't include the code that is used to load the XML file into a DOM object. When you read an XML document from a file, the whitespaces between tags actually constitute valid DOM nodes, according to the DOM specification. Therefore, the XML parser treats each such sequence of whitespaces as a DOM node (of type <code>TEXT</code>); To get rid of it, there are three approaches I can think of: <ul> <li> Associate the XML with a schema, and then use <code>setValidating(true)</code> along with <code>setIgnoringElementContentWhitespace(true)</code> on the <code>DocumentBuilderFactory</code>. (Note: <code>setIgnoringElementContentWhitespace</code> will only work if the parser is in validating mode, which is why you must use <code>setValidating(true)</code>) </li> <li>Write an XSL to process all nodes, filtering out whitespace-only <code>TEXT</code> nodes.</li> <li> Use Java code to do this: use XPath to find all whitespace-only <code>TEXT</code> nodes, iterate through them and remove each one from its parent (using <code>getParentNode().removeChild()</code>). Something like this would do (<code>doc</code> would be your DOM document object): <pre class="prettyprint"><code>XPath xp = XPathFactory.newInstance().newXPath(); NodeList nl = (NodeList) xp.evaluate("//text()[normalize-space(.)='']", doc, XPathConstants.NODESET); for (int i=0; i < nl.getLength(); ++i) { Node node = nl.item(i); node.getParentNode().removeChild(node); } </code></pre> </li> </ul>

How to remove extra empty lines from XML file?

Tags:

In short; i have many empty lines generated in an XML file, and i am looking for a way to remove them as a way of leaning the file. How can i do that ?

For detailed explanation; I currently have this XML file :

<recent>
  <paths>
    <path>path1</path>
    <path>path2</path>
    <path>path3</path>
    <path>path4</path>
  </paths>
</recent>

And i use this Java code to delete all tags, and add new ones instead :

public void savePaths( String recentFilePath ) {
    ArrayList<String> newPaths = getNewRecentPaths();
    Document recentDomObject = getXMLFile( recentFilePath );  // Get the <recent> element.
    NodeList pathNodes = recentDomObject.getElementsByTagName( "path" );   // Get all <path> nodes.

    //1. Remove all old path nodes :
        for ( int i = pathNodes.getLength() - 1; i >= 0; i-- ) { 
            Element pathNode = (Element)pathNodes.item( i );
            pathNode.getParentNode().removeChild( pathNode );
        }

    //2. Save all new paths :
        Element pathsElement = (Element)recentDomObject.getElementsByTagName( "paths" ).item( 0 );   // Get the first <paths> node.

        for( String newPath: newPaths ) {
            Element newPathElement = recentDomObject.createElement( "path" );
            newPathElement.setTextContent( newPath );
            pathsElement.appendChild( newPathElement );
        }

    //3. Save the XML changes :
        saveXMLFile( recentFilePath, recentDomObject ); 
}

After executing this method a number of times i get an XML file with right results, but with many empty lines after the "paths" tag and before the first "path" tag, like this :

<recent>
  <paths>





    <path>path5</path>
    <path>path6</path>
    <path>path7</path>
  </paths>
</recent>

Anyone knows how to fix that ?

------------------------------------------- Edit: Add the getXMLFile(...), saveXMLFile(...) code.

public Document getXMLFile( String filePath ) { 
    File xmlFile = new File( filePath );

    try {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document domObject = db.parse( xmlFile );
        domObject.getDocumentElement().normalize();

        return domObject;
    } catch (Exception e) {
        e.printStackTrace();
    }

    return null;
}

public void saveXMLFile( String filePath, Document domObject ) {
    File xmlOutputFile = null;
    FileOutputStream fos = null;

    try {
        xmlOutputFile = new File( filePath );
        fos = new FileOutputStream( xmlOutputFile );
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
        transformer.setOutputProperty( "{http://xml.apache.org/xslt}indent-amount", "2" );
        DOMSource xmlSource = new DOMSource( domObject );
        StreamResult xmlResult = new StreamResult( fos );
        transformer.transform( xmlSource, xmlResult );  // Save the XML file.
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (TransformerConfigurationException e) {
        e.printStackTrace();
    } catch (TransformerException e) {
        e.printStackTrace();
    } finally {
        if (fos != null)
            try {
                fos.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
    }
}

659

asked Oct 01 '12 08:10

Brad

1 Answers

First, an explanation of why this happens — which might be a bit off since you didn't include the code that is used to load the XML file into a DOM object.

When you read an XML document from a file, the whitespaces between tags actually constitute valid DOM nodes, according to the DOM specification. Therefore, the XML parser treats each such sequence of whitespaces as a DOM node (of type TEXT);

To get rid of it, there are three approaches I can think of:

Associate the XML with a schema, and then use setValidating(true) along with setIgnoringElementContentWhitespace(true) on the DocumentBuilderFactory.

(Note: setIgnoringElementContentWhitespace will only work if the parser is in validating mode, which is why you must use setValidating(true))
Write an XSL to process all nodes, filtering out whitespace-only TEXT nodes.

Use Java code to do this: use XPath to find all whitespace-only TEXT nodes, iterate through them and remove each one from its parent (using getParentNode().removeChild()). Something like this would do (doc would be your DOM document object):

XPath xp = XPathFactory.newInstance().newXPath();
NodeList nl = (NodeList) xp.evaluate("//text()[normalize-space(.)='']", doc, XPathConstants.NODESET);

for (int i=0; i < nl.getLength(); ++i) {
    Node node = nl.item(i);
    node.getParentNode().removeChild(node);
}

114

answered Oct 19 '22 21:10

Isaac

Related questions
                            
                                Adding text labels to ggplot2 scatterplot
                            
                                Django - {% csrf_token %} was used in a template, but the context did not provide the value
                            
                                Does QML support access specifiers like Private for properties?
                            
                                Time to live of a item in dynamodb
                            
                                PointToPoint vs Publish/subscribe model in JMS?
                            
                                difference between grep Vs cat and grep
                            
                                Variable declaration between function name and first curly brace
                            
                                Trying to find all occurrences of an object in Arraylist, in java
                            
                                Networkx: extract the connected component containing a given node (directed graph)
                            
                                Is there any way I can execute a PHP script from MySQL?
                            
                                header and footer in each page in print mode with css
                            
                                Hostname / IP doesn't match certificate's altname

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With