Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing huge json-array files with jq

Tags:

json

jq

I have huge (~7GB) json array of relatively small objects.

Is there relatively simple way to filter these objects without loading whole file into memory?

--stream option looks suitable, but I can't figure out how to fold stream of [path,value] to original objects.

like image 415
Dmitry Ermolov Avatar asked Aug 24 '15 12:08

Dmitry Ermolov


People also ask

Can jq handle large files?

You can use jq ! Note for really large files, you can have a look at the --streaming option. But there's no way to do random access on large JSON files because without a semi-index.

Is jq slow?

jq is almost always the bottleneck in the pipeline at 100% CPU - so much so that we often add an fgrep to the left side of the pipeline to minimize the input to jq as much as possible. It's very slow.

Is JSON good for large files?

NET stack, Json.NET is a great tool for parsing large files. It's fast, efficient, and it's the most downloaded NuGet package out there.

Can jq create JSON?

jq is a command-line utility that can slice, filter, and transform the components of a JSON file.


1 Answers

jq 1.5 has a streaming parser. The jq FAQ gives an example of how to convert a top-level array of JSON objects into a stream of its elements:

$ jq -nc --stream 'fromstream(1|truncate_stream(inputs))'
[{"foo":"bar"},{"foo":"baz"}]
{"foo":"bar"}
{"foo":"baz"}

That may be enough for your purposes, but it is worthwhile noting that setpath/2 can be helpful. Here's how to produce a stream of leaflets:

jq -c --stream '. as $in | select(length == 2) | {}|setpath($in[0]; $in[1])'

Further information and documentation is available in the jq manual: https://stedolan.github.io/jq/manual/#Streaming

like image 59
peak Avatar answered Sep 23 '22 11:09

peak