SQL-style GROUP BY aggregate functions in jq (COUNT, SUM and etc)

Question

How to emulate the COUNT aggregate function which should behave similarly to its SQL original? Let's extend this question even more to include other regular SQL functions:

COUNT
SUM / MAX/ MIN / AVG
ARRAY_AGG

The last one is not a standard SQL function - it's from PostgreSQL but is quite useful.

At input comes a stream of valid JSON objects. For demonstration let's pick a simple story of owners and their pets.

Model and data

Base relation: Owner

id name  age
 1 Adams  25
 2 Baker  55
 3 Clark  40
 4 Davis  31

Base relation: Pet

id name  litter owner_id
10 Bella      4        1
20 Lucy       2        1
30 Daisy      3        2
40 Molly      4        3
50 Lola       2        4
60 Sadie      4        4
70 Luna       3        4

Source

From above we get a derivative relation Owner_Pet (a result of SQL JOIN of the above relations) presented in JSON format for our jq queries (the source data):

{ "owner_id": 1, "owner": "Adams", "age": 25, "pet_id": 10, "pet": "Bella", "litter": 4 }
{ "owner_id": 1, "owner": "Adams", "age": 25, "pet_id": 20, "pet": "Lucy",  "litter": 2 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pet_id": 30, "pet": "Daisy", "litter": 3 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pet_id": 40, "pet": "Molly", "litter": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 50, "pet": "Lola",  "litter": 2 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 60, "pet": "Sadie", "litter": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pet_id": 70, "pet": "Luna",  "litter": 3 }

Requests

Here are sample requests and their expected output:

COUNT the number of pets per owner:

{ "owner_id": 1, "owner": "Adams", "age": 25, "pets_count": 2 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pets_count": 1 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pets_count": 1 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pets_count": 3 }

SUM up the number of whelps per owner and get their MAX (MIN/AVG):

{ "owner_id": 1, "owner": "Adams", "age": 25, "litter_total": 6, "litter_max": 4 }
{ "owner_id": 2, "owner": "Baker", "age": 55, "litter_total": 3, "litter_max": 3 }
{ "owner_id": 3, "owner": "Clark", "age": 40, "litter_total": 4, "litter_max": 4 }
{ "owner_id": 4, "owner": "Davis", "age": 31, "litter_total": 9, "litter_max": 4 }

ARRAY_AGG pets per owner:

{ "owner_id": 1, "owner": "Adams", "age": 25, "pets": [ "Bella", "Lucy" ] }
{ "owner_id": 2, "owner": "Baker", "age": 55, "pets": [ "Daisy" ] }
{ "owner_id": 3, "owner": "Clark", "age": 40, "pets": [ "Molly" ] }
{ "owner_id": 4, "owner": "Davis", "age": 31, "pets": [ "Lola", "Sadie", "Luna" ] }

236

asked Jan 18 '18 12:01

Onkeltem

1 Answers

Here's an alternative, not using any custom functions with basic JQ. (I took the liberty to get rid of redundant parts of the question)

Count

In> jq -s 'group_by(.owner_id) |  map({ owner_id: .[0].owner_id, count: map(.pet) | length})'
Out>[{"owner_id": "1","pets_count": 2}, ...]

Sum

In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, sum: map(.litter) | add})'
Out> [{"owner_id": "1","sum": 6}, ...]

Max

In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, max: map(.litter) | max})'
Out> [{"owner_id": "1","max": 4}, ...]

Aggregate

In> jq -s 'group_by(.owner_id) | map({owner_id: .[0].owner_id, agg: map(.pet) })'
Out> [{"owner_id": "1","agg": ["Bella","Lucy"]}, ...]

Sure, these might not be the most efficient implementations, but they show nicely how to implement custom functions oneself. All that changes between the different functions is inside the last map and the function after the pipe | (length, add, max)

The first map iterates over the different groups, taking the name from the first item, and using map again to iterate over the same-group items. Not as pretty as SQL, but not terribly more complicated.

I learned JQ today, and managed to do this already, so this should be encouraging for anyone getting started. JQ is neither like sed nor like SQL, but not terribly hard either.

answered Sep 30 '22 03:09

Cornelius Roemer

Related questions
                            
                                SQL string comparision using IF
                            
                                multi-column index for string match + string similarity with pg_trgm?
                            
                                SQL Server generating XML with generic field elements
                            
                                Expected ID or Quoted_ID in SQL
                            
                                Why does jOOQ suggest to put generated code under "/target" and not under "/src"?
                            
                                Liquibase create indexes with functions
                            
                                How to delete all dependent rows
                            
                                How to use MySQL Workbench to set up connection and connect Google cloud sql
                            
                                Selecting Substring SQL
                            
                                In MySQL, how do I select a result where the result contains every value I test for?
                            
                                Optimal Postgres text index for LIKE query?
                            
                                Execute multiple SQL commands at once on R
                            
                                Save and Display Image from DataBase
                            
                                SQL - speed up query
                            
                                How to use column' value in subquery?
                            
                                Oracle SQL sum up values till another value is reached
                            
                                SQL - Get last message from each conversation
                            
                                Error 1305 when importing sql dump into mySQL Workbench
                            
                                Get rows that no foreign keys point to
                            
                                Pandas Vs SQL Speed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL-style GROUP BY aggregate functions in jq (COUNT, SUM and etc)

Tags:

json

sql

group-by

aggregate-functions

jq