Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to group a JSON by a key and sort by its count?

Tags:

json

jq

I start from a jsonlines file similar to this

{ "kw": "foo", "age": 1}
{ "kw": "foo", "age": 1}
{ "kw": "foo", "age": 1}
{ "kw": "bar", "age": 1}
{ "kw": "bar", "age": 1}

Please note each line is a valid json, but the whole file is not.

The output I'm seeking is an ordered list of keywords sorted by its occurrence. Like this:

[
    {"kw": "foo", "count": 3},
    {"kw": "bar", "count": 2}
]

I'm able to group and count the keywords using the slurp option

jq --slurp '. | group_by(.kw) | .[] | {kw: .[0].kw, count: . | length }'

Output:

{"kw":"bar","count":2}
{"kw":"foo","count":3}

But:

  • This is not sorted
  • This is not valid JSON array

A very stupid solution I've found, is to pass twice via jq :)

jq --slurp --compact-output '. | group_by(.kw) | .[] | {kw: .[0].kw, count: . | length }' sample.json \
| jq --slurp --compact-output '. | sort_by(.count)'

But I'm pretty sure someone smarter than me can find a more elegant solution.

like image 833
IgnazioC Avatar asked Oct 17 '25 18:10

IgnazioC


1 Answers

This is not sorted

That is not quite correct, group_by(.foo) internally does a sort(.foo), so the results are shown in the sorted order of the field. See jq Manual - group_by(path_expression)

This is not valid JSON array

Just enclose the operation within [..] and also the leading . is optional. So just do

jq --slurp --compact-output '[ group_by(.kw)[] | {kw: .[0].kw, count: length } ]'

If you are referring to sort by the .count you can do a ascending sort and reverse

jq --slurp --compact-output '[ group_by(.kw)[] | {kw: .[0].kw, count: length }] | sort_by(.count) | reverse'
like image 75
Inian Avatar answered Oct 20 '25 08:10

Inian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!