Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

xmlstarlet sel on large file

The command

$ xmlstarlet sel -t -c "/collection/record" file.xml

seems to load the whole file into memory, before applying the given Xpath expression. This is not usable for large XML files.

Does xmlstarlet provide a streaming mode to extract subelements from a large (100G+) XML file?

like image 230
miku Avatar asked Nov 11 '15 15:11

miku


2 Answers

Since I only needed a tiny subset of XPath for large XML files, I actually implemented a little tool myself: xmlcutty.

The example from my question could be written like this:

$ xmlcutty -path /collection/record file.xml
like image 183
miku Avatar answered Sep 20 '22 02:09

miku


Xmlstarlet translates all (or most) operations into xslt transformations, so the short answer is no.

You could try to use stx, which is streaming transformation language similar to xslt. On the other hand, just coding something together in python using sax or iterparse may be easier and faster (wrt time needed to create code) if you don't care about xml that much.

like image 32
marbu Avatar answered Sep 19 '22 02:09

marbu