In short; i have many empty lines generated in an XML file, and i am looking for a way to remove them as a way of leaning the file. How can i do that ?
For detailed explanation; I currently have this XML file :
<recent>
<paths>
<path>path1</path>
<path>path2</path>
<path>path3</path>
<path>path4</path>
</paths>
</recent>
And i use this Java code to delete all tags, and add new ones instead :
public void savePaths( String recentFilePath ) {
ArrayList<String> newPaths = getNewRecentPaths();
Document recentDomObject = getXMLFile( recentFilePath ); // Get the <recent> element.
NodeList pathNodes = recentDomObject.getElementsByTagName( "path" ); // Get all <path> nodes.
//1. Remove all old path nodes :
for ( int i = pathNodes.getLength() - 1; i >= 0; i-- ) {
Element pathNode = (Element)pathNodes.item( i );
pathNode.getParentNode().removeChild( pathNode );
}
//2. Save all new paths :
Element pathsElement = (Element)recentDomObject.getElementsByTagName( "paths" ).item( 0 ); // Get the first <paths> node.
for( String newPath: newPaths ) {
Element newPathElement = recentDomObject.createElement( "path" );
newPathElement.setTextContent( newPath );
pathsElement.appendChild( newPathElement );
}
//3. Save the XML changes :
saveXMLFile( recentFilePath, recentDomObject );
}
After executing this method a number of times i get an XML file with right results, but with many empty lines after the "paths" tag and before the first "path" tag, like this :
<recent>
<paths>
<path>path5</path>
<path>path6</path>
<path>path7</path>
</paths>
</recent>
Anyone knows how to fix that ?
------------------------------------------- Edit: Add the getXMLFile(...), saveXMLFile(...) code.
public Document getXMLFile( String filePath ) {
File xmlFile = new File( filePath );
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document domObject = db.parse( xmlFile );
domObject.getDocumentElement().normalize();
return domObject;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
public void saveXMLFile( String filePath, Document domObject ) {
File xmlOutputFile = null;
FileOutputStream fos = null;
try {
xmlOutputFile = new File( filePath );
fos = new FileOutputStream( xmlOutputFile );
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
transformer.setOutputProperty( "{http://xml.apache.org/xslt}indent-amount", "2" );
DOMSource xmlSource = new DOMSource( domObject );
StreamResult xmlResult = new StreamResult( fos );
transformer.transform( xmlSource, xmlResult ); // Save the XML file.
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (TransformerConfigurationException e) {
e.printStackTrace();
} catch (TransformerException e) {
e.printStackTrace();
} finally {
if (fos != null)
try {
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
The XML file is still valid and well-formed even with those blank lines. The blank lines can be ignored. To resolve this issue, disable Format Output at the session level or ensure that some data is passed to the XML view.
First, an explanation of why this happens — which might be a bit off since you didn't include the code that is used to load the XML file into a DOM object.
When you read an XML document from a file, the whitespaces between tags actually constitute valid DOM nodes, according to the DOM specification. Therefore, the XML parser treats each such sequence of whitespaces as a DOM node (of type TEXT
);
To get rid of it, there are three approaches I can think of:
Associate the XML with a schema, and then use setValidating(true)
along with setIgnoringElementContentWhitespace(true)
on the DocumentBuilderFactory
.
(Note: setIgnoringElementContentWhitespace
will only work if the parser is in validating mode, which is why you must use setValidating(true)
)
TEXT
nodes.Use Java code to do this: use XPath to find all whitespace-only TEXT
nodes, iterate through them and remove each one from its parent (using getParentNode().removeChild()
). Something like this would do (doc
would be your DOM document object):
XPath xp = XPathFactory.newInstance().newXPath();
NodeList nl = (NodeList) xp.evaluate("//text()[normalize-space(.)='']", doc, XPathConstants.NODESET);
for (int i=0; i < nl.getLength(); ++i) {
Node node = nl.item(i);
node.getParentNode().removeChild(node);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With