Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transform huge xml files in java?

Tags:

java

parsing

xml

As the title says it, I have a huge xml file (GBs)

<root>  
<keep>  
   <stuff>  ...  </stuff>  
   <morestuff> ... </morestuff>  
</keep>  
<discard>  
   <stuff>  ...  </stuff>  
   <morestuff> ... </morestuff>
</discard>  
</root>  

and I'd like to transform it into a much smaller one which retains only a few of the elements.
My parser should do the following:
1. Parse through the file until a relevant element starts.
2. Copy the whole relevant element (with children) to the output file. go to 1.

step 1 is easy with SAX and impossible for DOM-parsers.
step 2 is annoying with SAX, but easy with the DOM-Parser or XSLT.

so what? - is there a neat way to combine SAX and DOM-Parser to do the task?

like image 783
user306708 Avatar asked May 05 '10 13:05

user306708


People also ask

How do I open a heavy XML file?

You can use default text editors, which come with your computer, like Notepad on Windows or TextEdit on Mac. All you have to do is locate the XML file, right-click the XML file, and select the "Open With" option. This will display a list of programs to open the file.

How can XML documents be transformed?

There are many methods you can use to transform XML documents including the XSLTRANSFORM function, an XQuery update expression, and XSLT processing by an external application server.


1 Answers

StAX would seem to be one obvious solution: it's a pull parser rather than either the "push" of SAX or the "buffer the whole thing" approach of DOM. Can't say I've used it though. A "StAX tutorial" search may come in handy :)

like image 163
Jon Skeet Avatar answered Oct 22 '22 12:10

Jon Skeet