<node> test
test
test
</node>
I want my XML parser read characters in <node>
and:
	
), newlines (

) or whitespaces (
) - they should be left.I'm trying a code below, but it preserve duplicated whitespaces.
dbf = DocumentBuilderFactory.newInstance();
dbf.setIgnoringComments( true );
dbf.setNamespaceAware( namespaceAware );
db = dbf.newDocumentBuilder();
doc = db.parse( inputStream );
Is the any way to do what I want?
Thanks!
The first part - replacing multiple white-space - is relatively easy though I don't think the parser will do it for you:
InputSource stream = new InputSource(inputStream);
XPath xpath = XPathFactory.newInstance().newXPath();
Document doc = (Document) xpath.evaluate("/", stream, XPathConstants.NODE);
NodeList nodes = (NodeList) xpath.evaluate("//text()", doc,
XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
Text text = (Text) nodes.item(i);
text.setTextContent(text.getTextContent().replaceAll("\\s{2,}", " "));
}
// check results
TransformerFactory.newInstance()
.newTransformer()
.transform(new DOMSource(doc), new StreamResult(System.out));
This is the hard part:
If the node contains XML encoded characters: tabs (
	
), newlines (

) or whitespaces (
) - they should be left.
The parser will always turn "	"
into "\t"
- you may need to write your own XML parser.
According to the author of Saxon:
I don't think any XML parser will report numeric character references to the application - they will always be expanded. Really, your application shouldn't care about this any more than it cares about how much whitespace there is between attributes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With