Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove extra empty lines from XML file?

Tags:

In short; i have many empty lines generated in an XML file, and i am looking for a way to remove them as a way of leaning the file. How can i do that ?

For detailed explanation; I currently have this XML file :

<recent>
  <paths>
    <path>path1</path>
    <path>path2</path>
    <path>path3</path>
    <path>path4</path>
  </paths>
</recent>

And i use this Java code to delete all tags, and add new ones instead :

public void savePaths( String recentFilePath ) {
    ArrayList<String> newPaths = getNewRecentPaths();
    Document recentDomObject = getXMLFile( recentFilePath );  // Get the <recent> element.
    NodeList pathNodes = recentDomObject.getElementsByTagName( "path" );   // Get all <path> nodes.

    //1. Remove all old path nodes :
        for ( int i = pathNodes.getLength() - 1; i >= 0; i-- ) { 
            Element pathNode = (Element)pathNodes.item( i );
            pathNode.getParentNode().removeChild( pathNode );
        }

    //2. Save all new paths :
        Element pathsElement = (Element)recentDomObject.getElementsByTagName( "paths" ).item( 0 );   // Get the first <paths> node.

        for( String newPath: newPaths ) {
            Element newPathElement = recentDomObject.createElement( "path" );
            newPathElement.setTextContent( newPath );
            pathsElement.appendChild( newPathElement );
        }

    //3. Save the XML changes :
        saveXMLFile( recentFilePath, recentDomObject ); 
}

After executing this method a number of times i get an XML file with right results, but with many empty lines after the "paths" tag and before the first "path" tag, like this :

<recent>
  <paths>





    <path>path5</path>
    <path>path6</path>
    <path>path7</path>
  </paths>
</recent>

Anyone knows how to fix that ?

------------------------------------------- Edit: Add the getXMLFile(...), saveXMLFile(...) code.

public Document getXMLFile( String filePath ) { 
    File xmlFile = new File( filePath );

    try {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document domObject = db.parse( xmlFile );
        domObject.getDocumentElement().normalize();

        return domObject;
    } catch (Exception e) {
        e.printStackTrace();
    }

    return null;
}

public void saveXMLFile( String filePath, Document domObject ) {
    File xmlOutputFile = null;
    FileOutputStream fos = null;

    try {
        xmlOutputFile = new File( filePath );
        fos = new FileOutputStream( xmlOutputFile );
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
        transformer.setOutputProperty( "{http://xml.apache.org/xslt}indent-amount", "2" );
        DOMSource xmlSource = new DOMSource( domObject );
        StreamResult xmlResult = new StreamResult( fos );
        transformer.transform( xmlSource, xmlResult );  // Save the XML file.
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (TransformerConfigurationException e) {
        e.printStackTrace();
    } catch (TransformerException e) {
        e.printStackTrace();
    } finally {
        if (fos != null)
            try {
                fos.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
    }
}
like image 659
Brad Avatar asked Oct 01 '12 08:10

Brad


People also ask

Can XML files have blank lines?

The XML file is still valid and well-formed even with those blank lines. The blank lines can be ignored. To resolve this issue, disable Format Output at the session level or ensure that some data is passed to the XML view.


1 Answers

First, an explanation of why this happens — which might be a bit off since you didn't include the code that is used to load the XML file into a DOM object.

When you read an XML document from a file, the whitespaces between tags actually constitute valid DOM nodes, according to the DOM specification. Therefore, the XML parser treats each such sequence of whitespaces as a DOM node (of type TEXT);

To get rid of it, there are three approaches I can think of:

  • Associate the XML with a schema, and then use setValidating(true) along with setIgnoringElementContentWhitespace(true) on the DocumentBuilderFactory.

    (Note: setIgnoringElementContentWhitespace will only work if the parser is in validating mode, which is why you must use setValidating(true))

  • Write an XSL to process all nodes, filtering out whitespace-only TEXT nodes.
  • Use Java code to do this: use XPath to find all whitespace-only TEXT nodes, iterate through them and remove each one from its parent (using getParentNode().removeChild()). Something like this would do (doc would be your DOM document object):

    XPath xp = XPathFactory.newInstance().newXPath();
    NodeList nl = (NodeList) xp.evaluate("//text()[normalize-space(.)='']", doc, XPathConstants.NODESET);
    
    for (int i=0; i < nl.getLength(); ++i) {
        Node node = nl.item(i);
        node.getParentNode().removeChild(node);
    }
    
like image 114
Isaac Avatar answered Oct 19 '22 21:10

Isaac