postgresql (redshift) maximum value for a specific column

Question

I'm working on redshift - I have a table like

userid  oid version number_of_objects
1       ab  1       10
1       ab  2       20
1       ab  3       17
1       ab  4       16
1       ab  5       14
1       cd  1       5
1       cd  2       6
1       cd  3       9
1       cd  4       12
2       ef  1       4
2       ef  2       3
2       gh  1       16
2       gh  2       12
2       gh  3       21

I would like to select from this table the maximum version number for every oid and get the userid and the number of the row.

When I tried this, unfortunately I've got the whole table back:

SELECT MAX(version), oid, userid, number_of_objects
FROM table
GROUP BY oid, userid, number_of_objects
LIMIT 10;

But the real result, what I'm looking for would be:

userid  oid MAX(version)    number_of_objects
1       ab  5               14
1       cd  4               12
2       ef  2               3
2       gh  3               21

Somehow distinct on doesn't work either, it says:

SELECT DISTINCT ON is not supported

Do you have any idea?

UPDATE: in the meantime I came up with this workaround, but I feel like this is not the smartest solution. It's also very slow. But it works at least. Just in case:

SELECT * FROM table,
   (SELECT MAX(version) as maxversion, oid, userid
    FROM table
    GROUP BY oid, userid
    ) as maxtable
    WHERE  table.oid = maxtable.oid
   AND table.userid = maxtable.userid
   AND table.version = maxtable.version
LIMIT 100;

Do you have any better solution?

Admin · Accepted Answer

If redshift does have window functions, you might try this:

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         max(version) over (partition by oid, userid) as max_version, 
  from the_table
) t
where version = max_version;

I would expect that to be faster than a self join with a group by.

Another option would be to use the row_number() function:

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         row_number() over (partition by oid, userid order by version desc) as rn, 
  from the_table
) t
where rn = 1;

It's more a matter of personal taste which one to use. Performance wise I wouldn't expect a difference.

postgresql (redshift) maximum value for a specific column

Tags:

sql

group-by

max

amazon-redshift

Tomi Mester

1 Answers

Recent Activity

Donate For Us

postgresql (redshift) maximum value for a specific column

Tags:

sql

group-by

max

amazon-redshift

Tomi Mester

1 Answers

Related questions

Recent Activity

Donate For Us