As the title says it, I have a huge xml file (GBs)
<root>
<keep>
<stuff> ... </stuff>
<morestuff> ... </morestuff>
</keep>
<discard>
<stuff> ... </stuff>
<morestuff> ... </morestuff>
</discard>
</root>
and I'd like to transform it into a much smaller one which retains only a few of the elements.
My parser should do the following:
1. Parse through the file until a relevant element starts.
2. Copy the whole relevant element (with children) to the output file. go to 1.
step 1 is easy with SAX and impossible for DOM-parsers.
step 2 is annoying with SAX, but easy with the DOM-Parser or XSLT.
so what? - is there a neat way to combine SAX and DOM-Parser to do the task?
You can use default text editors, which come with your computer, like Notepad on Windows or TextEdit on Mac. All you have to do is locate the XML file, right-click the XML file, and select the "Open With" option. This will display a list of programs to open the file.
There are many methods you can use to transform XML documents including the XSLTRANSFORM function, an XQuery update expression, and XSLT processing by an external application server.
StAX would seem to be one obvious solution: it's a pull parser rather than either the "push" of SAX or the "buffer the whole thing" approach of DOM. Can't say I've used it though. A "StAX tutorial" search may come in handy :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With