Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any XPath processor for SAX model?

Tags:

java

xml

xpath

sax

I'm looking for an XPath evaluator that doesn't rebuild the whole DOM document to look for the nodes of a document: actually the object is to manage a large amount of XML data (ideally over 2Gb) with SAX model, which is very good for memory management, and give the possibility to search for nodes.

Thank you all for the support!

For all those who say it's not possible: I recently, after asked the question, found a project named "saxpath" (http://www.saxpath.org/), but I can't find any implementing project.

like image 670
user189603 Avatar asked Dec 07 '09 22:12

user189603


2 Answers

My current list (compiled from web search results and the other answers) is:

  • http://code.google.com/p/xpath4sax/
  • http://spex.sourceforge.net/
  • https://github.com/santhosh-tekuri/jlibs/wiki/XMLDog (also contains a performance chart)
  • http://www.cs.umd.edu/projects/xsq/ (uniersity project, dead since 10 years, GPL)
  • MIT-Licensed approach http://softwareengineeringcorner.blogspot.com/2012/01/conveniently-processing-large-xml-files.html
  • Other parsers/memory models supporting fast XPath:
    • http://vtd-xml.sourceforge.net/ ("The world's fastest XPath 1.0 implementation.")
    • http://jaxen.codehaus.org/ (contains http://www.saxpath.org/)
    • http://www.saxonica.com/documentation/sourcedocs/streaming/streamable-xpath.html

The next step is to use the examples of XMLDog and compare the performance of all these approaches. Then, the test cases should be extended to the supported XPath expressions.

like image 66
2 revs, 2 users 96% Avatar answered Sep 29 '22 01:09

2 revs, 2 users 96%


We regularly parse 1GB+ complex XML files by using a SAX parser which extracts partial DOM trees that can be conveniently queried using XPath. I blogged about it here: http://softwareengineeringcorner.blogspot.com/2012/01/conveniently-processing-large-xml-files.html - Sources are available on github - MIT License.

like image 23
Andreas Haufler Avatar answered Sep 29 '22 00:09

Andreas Haufler