Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is DocumentBuilder thread safe?

The current code base that I am looking at uses the DOM parser. The following code fragment is duplicated in 5 methods :

 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();  DocumentBuilder builder = factory.newDocumentBuilder(); 

If a method that contains the above code is called in a loop or the method is called multiple times in the application, we are bearing the overhead of creating a new DocumentBuilderFactory instance and a new DocumentBuilder instance for each call to such a method.

Would it be a good idea to create a singleton wrapper around the DocumentBuilder factory and DocumentBuilder instances as shown below :

public final class DOMParser {    private DocumentBuilderFactory = new DocumentBuilderFactory();    private DocumentBuilder builder;     private static DOMParser instance = new DOMParser();     private DOMParser() {       builder = factory.newDocumentBuilder();    }     public Document parse(InputSource xml) {        return builder.parser(xml);    } } 

Are there any problems that can arise if the above singleton is shared across multiple threads? If not, will there be any performance gain by using the above approach of creating the DocumentBuilderFactory and the DocumentBuilder instances only once throughout the lifetime of the application?

Edit :

The only time we can face a problem is if DocumentBuilder saves some state information while parsing an XML file which can affect the parsing of the next XML file.

like image 691
Chetan Kinger Avatar asked Sep 17 '12 08:09

Chetan Kinger


People also ask

Can DocumentBuilder be reused?

An application can use the same instance of the factory to obtain one or more instances of the DocumentBuilder provided the instance of the factory isn't being used in more than one thread at a time.

Is DOM parser thread safe?

No. DOM does not require implementations to be thread safe. If you need to access the DOM from multiple threads, you are required to add the appropriate locks to your application code. How do I create a DOM parser?

What is DocumentBuilderFactory?

public abstract class DocumentBuilderFactory extends Object. Defines a factory API that enables applications to obtain a parser that produces DOM object trees from XML documents.


2 Answers

See the comments section for other questions about the same matter. Short answer for your question: no, it's not ok to put these classes in a singleton. Neither DocumentBuilderFactory nor DocumentBuilder are guaranteed to be thread safe. If you have several threads parsing XML, make sure each thread has its own version of DoumentBuilder. You only need one of them per thread since you can reuse a DocumentBuilder after you reset it.

EDIT A small snippet to show that using same DocumentBuilder is bad. With java 1.6_u32 and 1.7_u05 this code fails with org.xml.sax.SAXException: FWK005 parse may not be called while parsing. Uncomment synchronization on builder, and it works fine:

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();         final DocumentBuilder builder = factory.newDocumentBuilder();          ExecutorService exec = Executors.newFixedThreadPool(10);         for (int i = 0; i < 10; i++) {             exec.submit(new Runnable() {                 public void run() {                     try { //                        synchronized (builder) {                             InputSource is = new InputSource(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\" ?><俄语>данные</俄语>"));                             builder.parse(is);                             builder.reset(); //                        }                     } catch (Exception e) {                         e.printStackTrace();                     }                 }             });         }         exec.shutdown(); 

So here's your answer - do not call DocumentBuilder.parse() from multiple threads. Yes, this behavior might be JRE specific, if you're using IBM java or JRockit or give it a different DocumentBuilderImpl, it might work fine, but for default xerces implementation - it does not.

like image 124
Denis Tulskiy Avatar answered Sep 18 '22 01:09

Denis Tulskiy


The JAXP Specification (V 1.4) says:

It is expected that the newSAXParser method of a SAXParserFactory implementation, the newDocumentBuilder method of a DocumentBuilderFactory and the newTransformer method of a TransformerFactory will be thread safe without side effects. This means that an application programmer should expect to be able to create transformer instances in multiple threads at once from a shared factory without side effects or problems.

https://jaxp.java.net/docs/spec/html/#plugabililty-thread-safety

So, for example, you should be able to create a single DocumentBuilderFactory instance via DocumentBuilderFactory.newInstance and then use that single factory to create a DocumentBuilder per thread via DocumentBuilderFactory.newDocumentBuilder. You could also create a pool of DocumentBuilders.

I can't find anywhere that says that, for example, the static method DocumentBuilderFactory.newInstance is thread-safe. The implementation appears thread-safe in that there is some method synchronization being done, but the spec specifically says that DocumentBuilderFactory.newDocumentBuilder is thread safe.

like image 44
ttt Avatar answered Sep 20 '22 01:09

ttt