I'm using the standard javax.xml package to parse some XML files on a linux machine. My code is as follows:
try
{
// Prepare parser
DocumentBuilder documentBuilder = documentBuilderFactory
.newDocumentBuilder();
Document document = documentBuilder.parse(file.getAbsolutePath()); // This is line 397
XPath xPath = xPathFactory.newXPath();
...
}
catch(IOException e) { ... }
A single DocumentBuilderFactory is accessed by multiple threads, as is a single XPathFactory, I believe this to be acceptable usage. I occasionally see the following error when parsing an XML file using the above code.
java.io.IOException: Bad file descriptor
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:229)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:229)
at java.io.BufferedInputStream.read(BufferedInputStream.java:246)
at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at mypackage.MyXmlParser.parseFile(MyXmlParser.java:397)
at mypackage.MyXmlParser.access$500(MyXmlParser.java:51)
at mypackage.MyXmlParser$1.call(MyXmlParser.java:337)
at mypackage.MyXmlParser$1.call(MyXmlParser.java:328)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:284)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:665)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:690)
at java.lang.Thread.run(Thread.java:799)
I occasionally (~10% of the time) see the following additional text:
Caused by:
java.io.IOException: Bad file descriptor
at org.apache.xml.serializer.ToStream.flushWriter(ToStream.java:260)
at org.apache.xml.serializer.ToXMLStream.endDocument(ToXMLStream.java:191)
at org.apache.xalan.transformer.TransformerIdentityImpl.endDocument(TransformerIdentityImpl.java:983)
at org.apache.xml.serializer.TreeWalker.traverse(TreeWalker.java:174)
at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:410)
... 9 more
When I inspect the files manually, I can see no difference between the files that fail and the files that pass. I can confirm the files that pass are valid XML and have no special characters or premature endings.
Does anyone know why this might be happening, and how I can avoid it?
> java -version
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pxa64dev-20061002a (SR3) )
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux amd64-64 j9vmxa6423-20061001 (JIT enabled)
J9VM - 20060915_08260_LHdSMr
JIT - 20060908_1811_r8
GC - 20060906_AA)
JCL - 20061002
It looks like an issue with concurrent threads.
The error can be somewhere outside the codelet which you show us. But also with DocumentBuilderFactory and XPathFactory I'm not sure if they are thread-safe; it is not mentioned in the documentation.
For a first test I recommend to you to put the whole code for parsing XML files into a synchronized {}
clause. If this solves your problem, then it definitively is a multithread problem. In this case you have to find out the smallest part of code which must be synchronized.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With