I have a collection of XML files, and some of them are pretty big (up to ~50 million element nodes). I am using xmllint
for validating those files, which works pretty nicely even for the huge ones thanks to the streaming API.
xmllint --loaddtd --stream --valid /path/to/huge.xml
I recently learned that xmllint
is also capable of doing command line XPath queries, which is very handy.
xmllint --loaddtd --xpath '/root/a/b/c/text()' /path/to/small.xml
However, these XPath queries do not work for the huge XML files. I just receive a "Killed" message after some time. I tried to enable the streaming API, but this just leads to no output at all.
xmllint --loaddtd --stream --xpath '/root/a/b/c/text()' /path/to/huge.xml
Is there a way to enable streaming mode when doing XPath queries using xmllint
? Are there other/better ways to do command line XPath queries for huge XML files?
If your XPath expressions are very simple, try xmlcutty.
From the homepage:
xmlcutty is a simple tool for carving out elements from large XML files, fast. Since it works in a streaming fashion, it uses almost no memory and can process around 1G of XML per minute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With