I want to extract specific nodes from a large XML file. That works well, until a wild CDATA without any content appears.
The output:
ERROR: ''
javax.xml.transform.TransformerException: java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:732)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
at xml_test.XML_Test.extractXML2(XML_Test.java:698)
at xml_test.XML_Test.main(XML_Test.java:811)
Caused by: java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
... 3 more
---------
java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
at xml_test.XML_Test.extractXML2(XML_Test.java:698)
at xml_test.XML_Test.main(XML_Test.java:811)
The code:
InputStream stream = new FileInputStream("C:\\myFile.xml");
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader reader = factory.createXMLStreamReader(stream);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
String extractPath = "/root";
String path = "";
while(reader.hasNext()) {
reader.next();
if(reader.isStartElement()) {
path += "/" + reader.getLocalName();
if(path.equals(extractPath)) {
StringWriter writer = new StringWriter();
StAXSource src = new StAXSource(reader);
StreamResult res = new StreamResult(writer);
t.transform(src, res); // Exception thrown
System.out.println(writer.toString());
path = path.substring(0, path.lastIndexOf("/"));
}
}
else if(reader.isEndElement()) {
path = path.substring(0, path.lastIndexOf("/"));
}
}
The XML that raises the error:
<foo><![CDATA[]]></foo>
Can I make the Transformer
to just ignore that? Or what would another implementation look like? I'm not able to change the input XML!
This is an issue on Xerces implementation, check this: https://issues.apache.org/jira/browse/XERCESJ-1033
It seems that empty CDATA are not supposed to exist, so the only advices that I can give it to you is:
<![CDATA[]]>
" with "")<![CDATA[ ]]>
I add some examples with another implementation.
In Jaxb you map your XML to POJO's in a simple manner.
For example, if you have the next xml file in c:\myFile.xml:
<root>
<foo><![CDATA[]]></foo>
<foo><![CDATA[some data here]]></foo>
</root>
You could have the next POJO's:
@XmlRootElement
public class Root {
@XmlElement(name="foo")
privateList<Foo> foo;
public List<Foo> getFooList() {
return foo;
}
public void setFooList(List<Foo> fooList) {
this.foo = fooList;
}
}
@XmlType(name = "foo")
public class Foo {
@XmlValue
private String content;
@Override
public String toString() {
return content;
}
}
And then parse from XML to Object with the next snippet:
public static void main(String[] args) {
try {
File file = new File("C:\\myFile.xml");
JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
Root root = (Root) jaxbUnmarshaller.unmarshal(file);
for (Foo foo : root.getFooList()) {
System.out.println(String.format("Foo content: |%s|", foo));
}
} catch (JAXBException e) {
e.printStackTrace();
}
}
I tested this and raises no error.
I encountered this error with two builds of the same application, one build exhibiting the error when handing empty <![CDATA[]]>
and the other not.
The difference turned out to be that the broken build was using Xerces (embedded in jre), while the working build had an extra dependency added on the classpath, https://mvnrepository.com/artifact/org.codehaus.woodstox/woodstox-core-asl.
Relevant part of the stacktrace for the broken build would be
java.lang.Exception
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1144)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:242)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:152)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:679)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:728)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:343)
at com.sun.org.apache.xerces.internal.jaxp.validation.StAXValidatorHelper.validate(StAXValidatorHelper.java:107)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:123)
at javax.xml.validation.Validator.validate(Validator.java:124)
While for the working build
java.lang.Exception
at com.ctc.wstx.sr.BasicStreamReader.getTextCharacters(BasicStreamReader.java:894)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:242)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:152)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:679)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:728)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:343)
at com.sun.org.apache.xerces.internal.jaxp.validation.StAXValidatorHelper.validate(StAXValidatorHelper.java:107)
at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:123)
at javax.xml.validation.Validator.validate(Validator.java:124)
This Q/A helped me to get "comfortable" with Woodstox What is the relation between fasterxml(jackson-dataformat-xml) and Woodstox?.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With