I've got a tool that outputs a JSON record on each line, and I'd like to process it with jq
.
The output looks something like this:
{"ts":"2017-08-15T21:20:47.029Z","id":"123","elapsed_ms":10}
{"ts":"2017-08-15T21:20:47.044Z","id":"456","elapsed_ms":13}
When I pass this to jq
as follows:
./tool | jq 'group_by(.id)'
...it outputs an error:
jq: error (at <stdin>:1): Cannot index string with string "id"
How do I get jq
to handle JSON-record-per-line data?
The slurp option ( -s ) changes the input to the jq program. It reads all the input values and build an array for the query input. Using with the raw input option ( -R ) means reading the entire input as a string. The inputs function is a special stream that emits the remaining JSON values given to the jq program.
jq uses filters to parse JSON, and the simplest of these filters is a period ( . ), which means “print the entire object.” By default, jq pretty-prints the output. We put it all together and type the following: curl -s http://api.open-notify.org/iss-now.json | jq . That's much better!
jq is a lightweight and flexible command-line JSON processor and, we could say it is like awk or sed, but for JSON syntax. It may installed via apt in Ubuntu, or downloaded directly from github.
Use the --slurp
(or -s
) switch:
./tool | jq --slurp 'group_by(.id)'
It outputs the following:
[
[
{
"ts": "2017-08-15T21:20:47.029Z",
"id": "123",
"elapsed_ms": 10
}
],
[
{
"ts": "2017-08-15T21:20:47.044Z",
"id": "456",
"elapsed_ms": 13
}
]
]
...which you can then process further. For example:
./tool | jq -s 'group_by(.id) | map({id: .[0].id, count: length})'
As @JeffMercado pointed out, jq handles streams of JSON just fine, but if you use group_by
, then you'd have to ensure its input is an array. That could be done in this case using the -s
command-line option; if your jq has the inputs
filter, then it can also be done using that filter in conjunction with the -n
option.
If you have a version of jq with inputs
(which is available in jq 1.5), however, then a better approach would be to use the following streaming variant of group_by
:
# sort-free stream-oriented variant of group_by/1
# f should always evaluate to a string.
# Output: a stream of arrays, one array per group
def GROUPS_BY(stream; f): reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;
Usage example: GROUPS_BY(inputs; .id)
Note that you will want to use this with the -n
command line option.
Such a streaming variant has two main advantages:
group_by/1
.Please note that the above definition of GROUPS_BY/2
follows the convention for such streaming filters in that it produces a stream. Other variants are of course possible.
The following illustrates how to economize on memory. Suppose the task is to produce a frequency count of .id values. The humdrum solution would be:
GROUPS_BY(inputs; .id) | [(.[0]|.id), length]
A more economical and indeed far better solution would be:
GROUPS_BY(inputs|.id; .) | [.[0], length]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With