Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

select distinct records based on one field while keeping other fields intact

I've got a table like this:

      table: searches
+------------------------------+
| id  |   address   |   date   |
+------------------------------+
| 1   | 123 foo st  | 03/01/13 |
| 2   | 123 foo st  | 03/02/13 |
| 3   | 456 foo st  | 03/02/13 |
| 4   | 567 foo st  | 03/01/13 |
| 5   | 456 foo st  | 03/01/13 |
| 6   | 567 foo st  | 03/01/13 |
+------------------------------+

And want a result set like this:

+------------------------------+
| id  |   address   |   date   |
+------------------------------+
| 2   | 123 foo st  | 03/02/13 |
| 3   | 456 foo st  | 03/02/13 |
| 4   | 567 foo st  | 03/01/13 |
+------------------------------+

But ActiveRecord seems unable to achieve this result. Here's what I'm trying:

  • Model has a 'most_recent' scope: scope :most_recent, order('date_searched DESC')
  • Model.most_recent.uniq returns the full set (SELECT DISTINCT "searches".* FROM "searches" ORDER BY date DESC) -- obviously the query is not going to do what I want, but neither is selecting only one column. I need all columns, but only rows where the address is unique in the result set.
  • I could do something like Model.select('distinct(address), date, id'), but that feels...wrong.
like image 224
Chris Cashwell Avatar asked Mar 21 '13 13:03

Chris Cashwell


Video Answer


1 Answers

You could do a

select max(id), address, max(date) as latest 
       from searches 
       group by address 
       order by latest desc

According to sqlfiddle that does exactly what I think you want.

It's not quite the same as your requirement output, which doesn't seem to care about which ID is returned. Still, the query needs to specify something, which is here done by the "max" aggregate function.

I don't think you'll have any luck with ActiveRecord's autogenerated query methods for this case. So just add your own query method using that SQL to your model class. It's completely standard SQL that'll also run on basically any other RDBMS.

Edit: One big weakness of the query is that it doesn't necessarily return actual records. If the highest ID for a given address doesn't corellate with the highest date for that address, the resulting "record" will be different from the one actually stored in the DB. Depending on the use case that might matter or not. For Mysql simply changing max(id) to id would fix that problem, but IIRC Oracle has a problem with that.

like image 90
creinig Avatar answered Sep 27 '22 22:09

creinig