I have a large table, which I want to group by one column value and produce an aggregate of another column value. As an aggregate I don't care about the actual value as long as it's a value that appears in any of the rows of the grouped by column. Something like coalesce(), e.g. an aggregate that produces the first non-null value it receives in the input set.
Of course, coalesce() is not an aggregate function, and there actually is no aggregate function matching the behavior I need, in the docs:
What can I do to retrieve any element for each group in a group by query?
I know I could use min() or max() but I'd rather avoid to compare all values to each other to identify the result. A solution that would prevent hitting any more pages for a group that already has a value would be ideal. It's a big table (several GB on disk) with large groups (hundreds of thousands rows).
I have seen there are recursive CTE and lateral joins. I am trying to wrap my head around these, to see if these might help...
Here's an example:
with t1(x) as (select * from generate_series(0, 10, 1)),
     t2(x, y) as (select * from t1, t1 t2)
select x
     , any_element(y) -- how can I simulate this any_element() aggregate function?
from t2
group by x
order by x
distinct on will return any row:
with t1(x) as (select * from generate_series(0, 10, 1)),
     t2(x, y) as (select * from t1, t1 t2)
select distinct on (x) x,y
from t2
where y is not null
order by x
Or just use min/max as suggested in the comments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With