Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do command line XPath queries in huge XML files?

Tags:

xml

xpath

xmllint

I have a collection of XML files, and some of them are pretty big (up to ~50 million element nodes). I am using xmllint for validating those files, which works pretty nicely even for the huge ones thanks to the streaming API.

xmllint --loaddtd --stream --valid /path/to/huge.xml

I recently learned that xmllint is also capable of doing command line XPath queries, which is very handy.

xmllint --loaddtd --xpath '/root/a/b/c/text()' /path/to/small.xml

However, these XPath queries do not work for the huge XML files. I just receive a "Killed" message after some time. I tried to enable the streaming API, but this just leads to no output at all.

xmllint --loaddtd --stream --xpath '/root/a/b/c/text()' /path/to/huge.xml

Is there a way to enable streaming mode when doing XPath queries using xmllint? Are there other/better ways to do command line XPath queries for huge XML files?

like image 910
MRA Avatar asked May 18 '15 14:05

MRA


Video Answer


1 Answers

If your XPath expressions are very simple, try xmlcutty.

From the homepage:

xmlcutty is a simple tool for carving out elements from large XML files, fast. Since it works in a streaming fashion, it uses almost no memory and can process around 1G of XML per minute.

like image 177
gioele Avatar answered Oct 19 '22 17:10

gioele