Given the following records (the first row being the column names):
name platform other_columns date
Eric Ruby something somedate
Eric Objective-C something somedate
Joe Ruby something somedate
How do I retrieve a singular record with all columns, such that the name column is always unique in the results set? I would like the query in this example to return the first Eric (w/ Ruby) record.
I think the closest I've gotten is to use "select distinct on (name) *...", but that requires me to order by name first, when I actually want to order the records by the date column.
How do I achieve this in Rails on PostgreSQL?
You can't do a simple .group(:name)
because that produces a GROUP BY name
in your SQL when you'll be selecting ungrouped and unaggregated columns, that leaves ambiguity as to which row to pick and PostgreSQL (rightly IMHO) complains:
When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column.
If you start adding more columns to your grouping with something like this:
T.group(T.columns.collect(&:name))
then you'll be grouping by things you don't want to and you'll end up pulling out the whole table and that's not what you want. If you try aggregating to avoid the grouping problem, you'll end up mixing different rows (i.e. one column will come from one row while another column will come from some other row) and that's not what you want either.
ActiveRecord really isn't built for this sort of thing but you can bend it to your will with some effort.
You're using AR so you presumably have an id
column. If you have PostgreSQL 8.4 or higher, then you could use window functions as a sort of localized GROUP BY; you'll need to window twice: once to figure out the name
/thedate
pairs and again to pick just one id
(just in case you have multiple rows with the same name
and thedate
which match the earliest thedate
) and hence get a unique row:
select your_table.*
from your_table
where id in (
-- You don't need DISTINCT here as the IN will take care of collapsing duplicates.
select min(yt.id) over (partition by yt.name)
from (
select distinct name, min(thedate) over (partition by name) as thedate
from your_table
) as dt
join your_table as yt
on yt.name = dt.name and yt.thedate = dt.thedate
)
Then wrap that in a find_by_sql
and you have your objects.
If you're using Heroku with a shared database (or some other environment without 8.4 or higher), then you're stuck with PostgreSQL 8.3 and you won't have window functions. In that case, you'd probably want to filter out the duplicates in Ruby-land:
with_dups = YourTable.find_by_sql(%Q{
select yt.*
from your_table yt
join (select name, min(thedate) as thedate from your_table group by name) as dt
on yt.name = dt.name and yt.thedate = dt.thedate
});
# Clear out the duplicates, sorting by id ensures consistent results
unique_matches = with_dups.sort_by(&:id).group_by(&:name).map { |x| x.last.first }
If you're pretty sure that there won't be duplicate name
/min(thedate)
pairs then the 8.3-compatible solution might be your best bet; but, if there will be a lot of duplicates, then you want the database to do as much work as possible to avoid creating thousands of AR objects that you're just going to throw away.
Maybe someone else with stronger PostgreSQL-Fu than me will come along and offer something nicer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With