Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Presto query: Find the key with maximum value in a map

Tags:

sql

presto

trino

I have a table

Name  pets
--------------
Andy  {dog:2, cat:1, bird:4}
John  {tiger:3, elephant:1, fish:2}
Mary  {dog:2, pig:2}

I want the find the pet type with maximum count for each person. In the event of tie, duplicate the row for each pet. The result should look like this:

Name  max_pet
------------------
Andy  bird
John  tiger
Mary  dog
Mary  pig

Currently, I exported the table and do it in python. But I am wondering can I achieve this using Presto/SQL query? Thanks!

like image 986
Edamame Avatar asked Mar 03 '23 05:03

Edamame


2 Answers

There are several ways to do this. One approach is to use UNNEST to convert the map into rows, with one row per map entry. You can then use the rank() window function to assign a ranking to the pets for each name, after which you select only the top ranked item(s).

WITH people (name, pets) AS (
  VALUES
    ('Andy', map_from_entries(array[('dog', 2), ('cat', 1), ('bird', 4)])),
    ('John', map_from_entries(array[('tiger', 3), ('elephant', 1), ('fish', 2)])),
    ('Mary', map_from_entries(array[('dog', 2), ('pig', 2)]))
)
SELECT name, pet AS max_pet
FROM (
    SELECT name, pet, count,
           rank() OVER (PARTITION BY name ORDER BY count DESC) rnk
    FROM people
    CROSS JOIN UNNEST(pets) AS t (pet, count)
)
WHERE rnk = 1;
 name | max_pet 
------+---------
 Andy | bird    
 John | tiger   
 Mary | dog     
 Mary | pig     
(4 rows)

Using UNNEST is simple to understand, but doesn't work well if you need to combine it with other operations, or if you have duplicate names.

Another approach is to convert the map into an array using map_entries(), use filter() to select the pet(s) with a count that equals the maximum count, then use transform() to only return the pet name. At this point, you have an array of the maximum pets. You can then UNNEST it into multiple rows, or keep it as an array for further processing. filter() and transform() utilize a lambda expression which is a Presto specific extension to SQL.

WITH people (name, pets) AS (
  VALUES
    ('Andy', map_from_entries(array[('dog', 2), ('cat', 1), ('bird', 4)])),
    ('John', map_from_entries(array[('tiger', 3), ('elephant', 1), ('fish', 2)])),
    ('Mary', map_from_entries(array[('dog', 2), ('pig', 2)]))
)
SELECT
    name,
    transform(
        filter(
            map_entries(pets),
            e -> e[2] = array_max(map_values(pets))),
        e -> e[1]) AS max_pets
FROM people;
 name |  max_pets  
------+------------
 Andy | [bird]     
 John | [tiger]    
 Mary | [dog, pig] 
(3 rows)
like image 116
David Phillips Avatar answered Mar 05 '23 12:03

David Phillips


For Athena users, where accessing row fields by index is not available (true at least for the 2nd version of the engine, have not checked the 3rd) the second approach from answer can be rewritten with map_keys and map access by index:

-- sample data
WITH people (name, pets) AS (
  VALUES
    ('Andy', map_from_entries(array[('dog', 2), ('cat', 1), ('bird', 4)])),
    ('John', map_from_entries(array[('tiger', 3), ('elephant', 1), ('fish', 2)])),
    ('Mary', map_from_entries(array[('dog', 2), ('pig', 2)]))
)

-- query
SELECT
    name,
    transform(
        filter(
            map_keys(pets),
            e -> pets[e] = array_max(map_values(pets))), -- we now that e is present in pets
        e -> e) AS max_pets
FROM people;

Output:

name max_pets
Andy [bird]
John [tiger]
Mary [dog, pig]

Or just use map_filter:

SELECT
    name,
    map_keys(
        map_filter(
            pets,
            (k,v) -> v = array_max(map_values(pets)))) max_pets
FROM people;
like image 29
Guru Stron Avatar answered Mar 05 '23 14:03

Guru Stron