Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate the results of a jq query into a single array

Tags:

json

jq

I have an input file that has independent JSON objects (i.e. not an array) and I want to filter a few fields from each of them and create an array with the resulting elements. It's basically a list of log statements in JSON format.

I am using jq for this, and it's working great, except that I can't aggregate all resulting objects into a single array.

The input is something like this:

{"name":"myname", "environment":"staging", "email":"[email protected]", "time":"2017-04-02T05:00:00.046Z"}
{"name":"myname", "environment":"staging", "email":"[email protected]", "time":"2017-02-02T05:00:00.046Z"}
...
{"name":"myname", "environment":"staging", "email":"[email protected]", "time":"2017-10-02T05:00:00.046Z"}
{"name":"myothername", "environment":"staging", "time":"2017-10-02T05:00:00.046Z"}

(Note that the last entry has no email field, and thus it will return a null value if not filtered)

From this list of objects I'd like to get only the fields email and time, and ignore the rest, so I used the following jq query:

jq '{email: (.email | values), time: (.time | values)}' input.json

Note that I use the values filter because the log messages are mixed, so not all json objects have the email field, so I ignore those.

My problem is now that even though I get the desired result, I get a list again, and I'd like an array.

I.e. I get something like

{"email":"[email protected]", "time":"2017-04-02T05:00:00.046Z"}
{"email":"[email protected]", "time":"2017-02-02T05:00:00.046Z"}
...
{"email":"[email protected]", "time":"2017-10-02T05:00:00.046Z"}

And I would like it like:

[
    {"email":"[email protected]", "time":"2017-04-02T05:00:00.046Z"},
    {"email":"[email protected]", "time":"2017-02-02T05:00:00.046Z"},
    ...,
    {"email":"[email protected]", "time":"2017-10-02T05:00:00.046Z"}
]

I've tried several different things but I usually end up with the error Cannot index array with string "email" which tells me I'm doing something wrong with the array operations.

I tried wrapping the query in map(), i.e. map({.userEmail, .time}), tried slurping the data with -s and I tried using the |+ and |= operators.

I have also tried wrapping the query inside array brackets like [{email: (.email|values), time:.time }], but I get the same resulting objects except each of them is wrapped inside an array by itself, i.e.

[{"email":"[email protected]", "time":"2017-04-02T05:00:00.046Z"}]
[{"email":"[email protected]", "time":"2017-02-02T05:00:00.046Z"}]
...
[{"email":"[email protected]", "time":"2017-10-02T05:00:00.046Z"}]

It seems like it's probably an easy thing to do, or a common operation at least, but I am failing to find the correct query.

What is the correct way then, of aggregating the query results into an array, when the input is not an array?

like image 398
Acapulco Avatar asked Jul 05 '19 22:07

Acapulco


2 Answers

Even better...

  1. Based on your sample data, your basic filter can be simplified to {email, time}

  2. In general, it is better to avoid "slurping" the input (e.g. to save memory). This can be accomplished in your case by using inputs with the -n command-line option.

Putting it all together:

jq -n '[inputs | {email, time }]' input.json

If there are some inputs that you want to filter out, you could use select, e.g.

jq -n '[inputs | select(.email) | {email, time } ]' input.json
like image 113
peak Avatar answered Nov 14 '22 05:11

peak


After reading some more I found the result I wanted, which is a combination of the slurp operator and map.

I realized that the query

jq -s 'map({email: (.email|values), time:.time })' input.json

Would read all the input items as an array and then as per the definition of map():

For any filter x, map(x) will run that filter for each element of the input array, and return the outputs in a new array

So the two combined gave me the result I needed.

like image 20
Acapulco Avatar answered Nov 14 '22 06:11

Acapulco