why is sax parsing faster than dom parsing ? and how does stax work?

Tags:

somewhat related to: libxml2 from java

yes, this question is rather long-winded - sorry. I kept is as dense as I felt possible. I bolded the questions to make it easier to peek at before reading the whole thing.

Why is sax parsing faster than dom parsing? The only thing I can come up with is that w/ sax you're probably ignoring the majority of the incoming data, and thus not wasting time processing parts of the xml you don't care about. IOW - after parsing w/ SAX, you can't recreate the original input. If you wrote your SAX parser so that it accounted for each and every xml node (and could thus recreate the original), then it wouldn't be any faster than DOM would it?

The reason I'm asking is that I'm trying to parse xml documents more quickly. I need to have access to the entire xml tree AFTER parsing. I am writing a platform for 3rd party services to plug into, so I can't anticipate what parts of the xml document will be needed and which parts won't. I don't even know the structure of the incoming document. This is why I can't use jaxb or sax. Memory footprint isn't an issue for me because the xml documents are small and I only need 1 in memory at a time. It's the time it takes to parse this relatively small xml document that is killing me. I haven't used stax before, but perhaps I need to investigate further because it might be the middle ground? If I understand correctly, stax keeps the original xml structure and processes the parts that I ask for on demand? In this way, the original parse time might be quick, but each time I ask it to traverse part of the tree it hasn't yet traversed, that's when the processing takes place?

If you provide a link that answers most of the questions, I will accept your answer (you don't have to directly answer my questions if they're already answered elsewhere).

update: I rewrote it in sax and it parses documents on avg 2.1 ms. This is an improvement (16% faster) over the 2.5 ms that dom was taking, however it is not the magnitude that I (et al) would've guessed

Thanks

582

asked Sep 29 '10 19:09

andersonbd1

3 Answers

Assuming you do nothing but parse the document, the ranking of the different parser standards is as follows:

1. StAX is the fastest

The event is reported to you

2. SAX is next

It does everything StAX does plus the content is realized automatically (element name, namespace, attributes, ...)

3. DOM is last

It does everything SAX does and presents the information as an instance of Node.

Your Use Case

If you need to maintain all of the XML, DOM is the standard representation. It integrates cleanly with XSLT transforms (javax.xml.transform), XPath (javax.xml.xpath), and schema validation (javax.xml.validation) APIs. However if performance is key, you may be able to build your own tree structure using StAX faster than a DOM parser could build a DOM.

171

answered Oct 01 '22 05:10

bdoughan

DOM parsing requires you to load the entire document into memory and then traverse a tree to find the information you want.

SAX only requires as much memory as you need to do basic IO, and you can extract the information that you need as the document is being read. Because SAX is stream oriented, you can even process a file which is still being written by another process.

answered Oct 01 '22 07:10

mikerobi

SAX is faster because DOM parsers often use a SAX parser to parse a document internally, then do the extra work of creating and manipulating objects to represent each and every node, even if the application doesn't care about them.

An application that uses SAX directly is likely to utilize the information set more efficiently than a DOM "parser" does.

StAX is a happy medium where an application gets a more convenient API than SAX's event-driven approach, yet doesn't suffer the inefficiency of creating a complete DOM.

answered Oct 01 '22 06:10

erickson

Related questions
                            
                                Determine if string starts with letters A through I
                            
                                The behaviour of equals() method in Java [duplicate]
                            
                                Get the id of last inserted record in mybatis
                            
                                Storing Array in JSON
                            
                                twoSum Algorithm : How to improve this?
                            
                                Java how to parse uint8 in java?
                            
                                What is a supertype method?
                            
                                Compound class names are not supported error in WebDriver
                            
                                Integer range when using 64bit jdk
                            
                                Android Back Button Doesn't Return to Previous Activity
                            
                                How to dismiss the keyboard in appium using Java?
                            
                                Can't instantiate class using Hibernate createQuery
                            
                                android jni return multiple variables
                            
                                Which one is better getOrDefault() or putIfAbsent() of HashMap in Java
                            
                                How to display an ArrayList in a RecyclerView? [duplicate]
                            
                                Force tableswitch instead of lookupswitch
                            
                                How can I skip the first line of a csv in Java?
                            
                                Host name may not be empty
                            
                                What is the Java equivalent to C#'s Windows Forms for building GUI apps easily and rapidly
                            
                                Randomly Generate Letters According to their Frequency of Use?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

why is sax parsing faster than dom parsing ? and how does stax work?

Tags:

java

dom

xml

stax

sax