How do I select unique records by column with ActiveRecord and Postgresql

Question

Given the following records (the first row being the column names):

name              platform           other_columns     date
Eric              Ruby               something         somedate
Eric              Objective-C        something         somedate
Joe               Ruby               something         somedate

How do I retrieve a singular record with all columns, such that the name column is always unique in the results set? I would like the query in this example to return the first Eric (w/ Ruby) record.

I think the closest I've gotten is to use "select distinct on (name) *...", but that requires me to order by name first, when I actually want to order the records by the date column.

Order records by date
If there are multiple records with the same name, select one (which does not matter)
Select all columns

How do I achieve this in Rails on PostgreSQL?

mu is too short · Accepted Answer

You can't do a simple .group(:name) because that produces a GROUP BY name in your SQL when you'll be selecting ungrouped and unaggregated columns, that leaves ambiguity as to which row to pick and PostgreSQL (rightly IMHO) complains:

When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.

If you start adding more columns to your grouping with something like this:

T.group(T.columns.collect(&:name))

then you'll be grouping by things you don't want to and you'll end up pulling out the whole table and that's not what you want. If you try aggregating to avoid the grouping problem, you'll end up mixing different rows (i.e. one column will come from one row while another column will come from some other row) and that's not what you want either.

ActiveRecord really isn't built for this sort of thing but you can bend it to your will with some effort.

You're using AR so you presumably have an id column. If you have PostgreSQL 8.4 or higher, then you could use window functions as a sort of localized GROUP BY; you'll need to window twice: once to figure out the name/thedate pairs and again to pick just one id (just in case you have multiple rows with the same name and thedate which match the earliest thedate) and hence get a unique row:

select your_table.*
from your_table
where id in (
    -- You don't need DISTINCT here as the IN will take care of collapsing duplicates.
    select min(yt.id) over (partition by yt.name)
    from (
        select distinct name, min(thedate) over (partition by name) as thedate
        from your_table
    ) as dt
    join your_table as yt
      on yt.name = dt.name and yt.thedate = dt.thedate
)

Then wrap that in a find_by_sql and you have your objects.

If you're using Heroku with a shared database (or some other environment without 8.4 or higher), then you're stuck with PostgreSQL 8.3 and you won't have window functions. In that case, you'd probably want to filter out the duplicates in Ruby-land:

with_dups = YourTable.find_by_sql(%Q{
    select yt.*
    from your_table yt
    join (select name, min(thedate) as thedate from your_table group by name) as dt
      on yt.name = dt.name and yt.thedate = dt.thedate
});

# Clear out the duplicates, sorting by id ensures consistent results
unique_matches = with_dups.sort_by(&:id).group_by(&:name).map { |x| x.last.first }

If you're pretty sure that there won't be duplicate name/min(thedate) pairs then the 8.3-compatible solution might be your best bet; but, if there will be a lot of duplicates, then you want the database to do as much work as possible to avoid creating thousands of AR objects that you're just going to throw away.

Maybe someone else with stronger PostgreSQL-Fu than me will come along and offer something nicer.

How do I select unique records by column with ActiveRecord and Postgresql

Tags:

sql

postgresql

ruby-on-rails

ruby-on-rails-3

Eric M.

1 Answers

mu is too short

Recent Activity

Donate For Us

How do I select unique records by column with ActiveRecord and Postgresql

Tags:

sql

postgresql

ruby-on-rails

ruby-on-rails-3

Eric M.

1 Answers

mu is too short

Related questions

Recent Activity

Donate For Us