I've got a tool that outputs a JSON record on each line, and I'd like to process it with <code>jq</code>. The output looks something like this: <pre class="prettyprint"><code>{"ts":"2017-08-15T21:20:47.029Z","id":"123","elapsed_ms":10} {"ts":"2017-08-15T21:20:47.044Z","id":"456","elapsed_ms":13} </code></pre> When I pass this to <code>jq</code> as follows: <pre class="prettyprint"><code>./tool | jq 'group_by(.id)' </code></pre> ...it outputs an error: <pre class="prettyprint"><code>jq: error (at <stdin>:1): Cannot index string with string "id" </code></pre> How do I get <code>jq</code> to handle JSON-record-per-line data?

As @JeffMercado pointed out, jq handles streams of JSON just fine, but if you use <code>group_by</code>, then you'd have to ensure its input is an array. That could be done in this case using the <code>-s</code> command-line option; if your jq has the <code>inputs</code> filter, then it can also be done using that filter in conjunction with the <code>-n</code> option. If you have a version of jq with <code>inputs</code> (which is available in jq 1.5), however, then a better approach would be to use the following streaming variant of <code>group_by</code>: <pre class="prettyprint"><code> # sort-free stream-oriented variant of group_by/1 # f should always evaluate to a string. # Output: a stream of arrays, one array per group def GROUPS_BY(stream; f): reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ; </code></pre> Usage example: <code>GROUPS_BY(inputs; .id)</code> Note that you will want to use this with the <code>-n</code> command line option. Such a streaming variant has two main advantages: <ol> <li>it generally requires less memory in that it does not require a copy of the entire input stream to be kept in memory while it is being processed;</li> <li>it is potentially faster because it does not require any sort operation, unlike <code>group_by/1</code>.</li> </ol> Please note that the above definition of <code>GROUPS_BY/2</code> follows the convention for such streaming filters in that it produces a stream. Other variants are of course possible. <h3>Handling a large amount of data</h3> The following illustrates how to economize on memory. Suppose the task is to produce a frequency count of .id values. The humdrum solution would be: <pre class="prettyprint"><code>GROUPS_BY(inputs; .id) | [(.[0]|.id), length] </code></pre> A more economical and indeed far better solution would be: <pre class="prettyprint"><code>GROUPS_BY(inputs|.id; .) | [.[0], length] </code></pre>

Parsing JSON record-per-line with jq?

Tags:

json

jq

grouping

jsonlines

I've got a tool that outputs a JSON record on each line, and I'd like to process it with jq.

The output looks something like this:

{"ts":"2017-08-15T21:20:47.029Z","id":"123","elapsed_ms":10}
{"ts":"2017-08-15T21:20:47.044Z","id":"456","elapsed_ms":13}

When I pass this to jq as follows:

./tool | jq 'group_by(.id)'

...it outputs an error:

jq: error (at <stdin>:1): Cannot index string with string "id"

How do I get jq to handle JSON-record-per-line data?

817

asked Aug 16 '17 13:08

Roger Lipscombe

2 Answers

Use the --slurp (or -s) switch:

./tool | jq --slurp 'group_by(.id)'

It outputs the following:

[
  [
    {
      "ts": "2017-08-15T21:20:47.029Z",
      "id": "123",
      "elapsed_ms": 10
    }
  ],
  [
    {
      "ts": "2017-08-15T21:20:47.044Z",
      "id": "456",
      "elapsed_ms": 13
    }
  ]
]

...which you can then process further. For example:

./tool | jq -s 'group_by(.id) | map({id: .[0].id, count: length})'

112

answered Sep 21 '22 01:09

Roger Lipscombe

As @JeffMercado pointed out, jq handles streams of JSON just fine, but if you use group_by, then you'd have to ensure its input is an array. That could be done in this case using the -s command-line option; if your jq has the inputs filter, then it can also be done using that filter in conjunction with the -n option.

If you have a version of jq with inputs (which is available in jq 1.5), however, then a better approach would be to use the following streaming variant of group_by:

 # sort-free stream-oriented variant of group_by/1
 # f should always evaluate to a string.
 # Output: a stream of arrays, one array per group
 def GROUPS_BY(stream; f): reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;

Usage example: GROUPS_BY(inputs; .id)

Note that you will want to use this with the -n command line option.

Such a streaming variant has two main advantages:

it generally requires less memory in that it does not require a copy of the entire input stream to be kept in memory while it is being processed;
it is potentially faster because it does not require any sort operation, unlike group_by/1.

Please note that the above definition of GROUPS_BY/2 follows the convention for such streaming filters in that it produces a stream. Other variants are of course possible.

Handling a large amount of data

The following illustrates how to economize on memory. Suppose the task is to produce a frequency count of .id values. The humdrum solution would be:

GROUPS_BY(inputs; .id) | [(.[0]|.id), length]

A more economical and indeed far better solution would be:

GROUPS_BY(inputs|.id; .) | [.[0], length]

answered Sep 22 '22 01:09

peak

Related questions
                            
                                Convert invalid json into valid json
                            
                                Encoding JSON in UTF-16 or UTF-32
                            
                                Any servers providing Muslim prayer time in JSON/XML? [closed]
                            
                                how to send POST json from C# to asp.net web api
                            
                                Customize Jackson ObjectMapper to Read custom Annotation and mask fields annotated
                            
                                adding to json property that may or may not exist yet
                            
                                What is the best strategy to handle unhandled Exceptions (error 500 responses) in Asp.Net MVC actions for Ajax requests?
                            
                                How to simply return JSON from a JSP
                            
                                Processing a large (12K+ rows) array in JavaScript
                            
                                AFNetworking JSON request with a boolean
                            
                                to_json returns string instead of json in Rails
                            
                                List of PayPal transactions
                            
                                Jackson polymorphism: How to map multiple subtypes to the same class
                            
                                Why does JSON.NET serialize everything on a single line?
                            
                                How do I save a JSON file with four spaces indentation using JSON.NET?
                            
                                JSON schema for dynamic properties
                            
                                Getting data from json using jq when key is numerical string
                            
                                Create an ObjectNode from JSON string
                            
                                How to post a multi line json string as body in curl inside bash script
                            
                                jsonpatch path to update array object by object ID

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parsing JSON record-per-line with jq?

Tags:

json

jq

grouping

jsonlines

Roger Lipscombe

People also ask

2 Answers

Roger Lipscombe

Handling a large amount of data

peak

Recent Activity

Donate For Us