Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Presto: is MAX_BY() deterministic

Tags:

presto

maxby

Is the function MAX_BY() deterministic. If I use MAX_() for two different columns both depending on a third one, will I get the same row result?

The presto documentation doesn't mention this. This documentation about mysql mention that it is not, so I'm not sure where to find this info.

I tested quickly with the following:

WITH my_table(id, arr, something) AS (
    VALUES
        (1, ARRAY['one'], 0.0),
        (2, ARRAY['two'], 0.0),
        (3, ARRAY['three'], 0.0),
        (4, ARRAY['four'], 0.0),
        (5, ARRAY['five'], 0.0),
        (6, ARRAY[''], 0.0)
)
SELECT
    MAX_BY(id,something),
    MAX_BY(arr,something)
FROM my_table

It returned the first row, so it doesn't feel arbitrary but also does not prove things.

Anyone out there able to help?

There is a related question to return multiple columns from a single MAX_BY() so I'm thinking that I need to use that solution to guarantee the attribute of the same row is selected: max_by with multiple return columns

like image 300
user1527152 Avatar asked Dec 14 '25 07:12

user1527152


2 Answers

No, in the case of ties, the result of max_by and min_by is arbitrary. It may appear to be deterministic, but that's not defined behavior and may change at some point.

If you want all the values to be consistent, you have to use the trick you referred to, where you pack all the columns of interest in a single value of type ROW:

SELECT max_by((x1, x2, x3), y) r
FROM (...) t(y, x1, x2, x3)
like image 140
Martin Traverso Avatar answered Dec 16 '25 21:12

Martin Traverso


It is probably safer, and more efficient as well, to use window functions:

select *
from (
    select t.*, row_number() over(order by something desc) rn
    from my_table t
) t
where rn = 1

For this simple case, a row-limiting clause is actually good enough:

select *
from my_table 
order by something desc
limit 1

Both query guarantee that the returned values all belong to the same row.

None, however, is deterministic, in the sense that consecutive executions of the same query might return a different row. If you want a stable result, then you need a column (or a set of columns) that can be used to uniquely identify each row: adding id to the order by clause would be just fine here.

like image 20
GMB Avatar answered Dec 16 '25 23:12

GMB



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!