I have an input file that has independent JSON objects (i.e. not an array) and I want to filter a few fields from each of them and create an array with the resulting elements. It's basically a list of log statements in JSON format.
I am using jq
for this, and it's working great, except that I can't aggregate all resulting objects into a single array.
The input is something like this:
{"name":"myname", "environment":"staging", "email":"[email protected]", "time":"2017-04-02T05:00:00.046Z"}
{"name":"myname", "environment":"staging", "email":"[email protected]", "time":"2017-02-02T05:00:00.046Z"}
...
{"name":"myname", "environment":"staging", "email":"[email protected]", "time":"2017-10-02T05:00:00.046Z"}
{"name":"myothername", "environment":"staging", "time":"2017-10-02T05:00:00.046Z"}
(Note that the last entry has no email
field, and thus it will return a null
value if not filtered)
From this list of objects I'd like to get only the fields email
and time
, and ignore the rest, so I used the following jq
query:
jq '{email: (.email | values), time: (.time | values)}' input.json
Note that I use the values
filter because the log messages are mixed, so not all json objects have the email
field, so I ignore those.
My problem is now that even though I get the desired result, I get a list again, and I'd like an array.
I.e. I get something like
{"email":"[email protected]", "time":"2017-04-02T05:00:00.046Z"}
{"email":"[email protected]", "time":"2017-02-02T05:00:00.046Z"}
...
{"email":"[email protected]", "time":"2017-10-02T05:00:00.046Z"}
And I would like it like:
[
{"email":"[email protected]", "time":"2017-04-02T05:00:00.046Z"},
{"email":"[email protected]", "time":"2017-02-02T05:00:00.046Z"},
...,
{"email":"[email protected]", "time":"2017-10-02T05:00:00.046Z"}
]
I've tried several different things but I usually end up with the error Cannot index array with string "email"
which tells me I'm doing something wrong with the array operations.
I tried wrapping the query in map()
, i.e. map({.userEmail, .time})
, tried slurping the data with -s
and I tried using the |+
and |=
operators.
I have also tried wrapping the query inside array brackets like [{email: (.email|values), time:.time }]
, but I get the same resulting objects except each of them is wrapped inside an array by itself, i.e.
[{"email":"[email protected]", "time":"2017-04-02T05:00:00.046Z"}]
[{"email":"[email protected]", "time":"2017-02-02T05:00:00.046Z"}]
...
[{"email":"[email protected]", "time":"2017-10-02T05:00:00.046Z"}]
It seems like it's probably an easy thing to do, or a common operation at least, but I am failing to find the correct query.
What is the correct way then, of aggregating the query results into an array, when the input is not an array?
Even better...
Based on your sample data, your basic filter can be simplified to {email, time}
In general, it is better to avoid "slurping" the input (e.g. to save memory). This can be accomplished in your case by using inputs
with the -n command-line option.
Putting it all together:
jq -n '[inputs | {email, time }]' input.json
If there are some inputs that you want to filter out, you could use select
, e.g.
jq -n '[inputs | select(.email) | {email, time } ]' input.json
After reading some more I found the result I wanted, which is a combination of the slurp operator and map.
I realized that the query
jq -s 'map({email: (.email|values), time:.time })' input.json
Would read all the input items as an array and then as per the definition of map():
For any filter x, map(x) will run that filter for each element of the input array, and return the outputs in a new array
So the two combined gave me the result I needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With