Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PG::Error: SELECT DISTINCT, ORDER BY expressions must appear in select list

ActionView::Template::Error (PG::Error: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list

I'm creating an events website and I'm trying to sort the rendered rsvps by the start time of the event. There are a lot of RSVPS so I'm grouping them with distinct, but I've been having a lot of difficulty over the last few days on sorting the results without this error popping up on PG. I've looked at some of the previous questions on the topic and am still pretty lost. How can I get this to work? Thank you so much!

@rsvps = Rsvp.where(:voter_id => current_user.following.collect {|f| f["id"]}, :status => 'going').where("start_time > ? AND start_time < ?", Time.now, Time.now + 1.month).order("count_all desc").count(:group => :event_id).collect { |f| f[0] }  <%= render :partial => 'rsvps/rsvp', :collection => Rsvp.where(:event_id => @rsvps).select("DISTINCT(event_id)").order('start_time asc') %> 
like image 507
Andrew Avatar asked Oct 02 '12 15:10

Andrew


2 Answers

I know this is a rather old question, but I just went through a small example in my head which helped me understand why Postgres has this seemingly odd restriction on SELECT DISTINCT / ORDER BY columns.

Imagine you have the following data in your Rsvp table:

 event_id |        start_time ----------+------------------------     0     | Mar 17, 2013  12:00:00     1     |  Jan 1, 1970  00:00:00     1     | Aug 21, 2013  16:30:00     2     |  Jun 9, 2012  08:45:00 

Now you want to grab a list of distinct event_ids, ordered by their respective start_times. But where should 1 go? Should it come first, because the one tuple starts on Jan 1, 1970, or should it go last because of the Aug 21, 2013?

As the database system can't make that decision for you and the syntax of the query can't depend on the actual data it might be operating on (assuming event_id is unique), we are restricted to ordering only by columns from the SELECT clause.

As for the actual question - an alternative to Matthew's answer is using an aggregate function like MIN or MAX for the sorting:

  SELECT event_id     FROM Rsvp GROUP BY event_id ORDER BY MIN(start_time) 

The explicit grouping and aggregation on start_time permit the database to come up with a unambiguous ordering of the result tuples. Note however, that readability is definitely an issue in this case ;)

like image 109
AdrianoKF Avatar answered Oct 10 '22 18:10

AdrianoKF


The ORDER BY clause can only be applied after the DISTINCT has been applied. Since only the fields in the SELECT statement are taken into consideration for the DISTINCT operations, those are the only fields may be used in the ORDER BY.

Logically, if you just want a distinct list of event_id values, what order they occur in should be irrelevant. If order does matter, then you should add the start_time to the SELECT list so that there is context for the order.

Also, these two SELECT clauses are NOT equivalent, so be careful:

SELECT DISTINCT(event_id, start_time) FROM ...  SELECT DISTINCT event_id, start_time FROM ... 

The second is the form you want. The first will return a series of records with the data represented as a ROW construct (a single column with a tuple inside). The second will return normal columns of data output. It only works as expected in the single-column case where the ROW construct is reduced down since it is only a single column.

like image 39
Matthew Wood Avatar answered Oct 10 '22 20:10

Matthew Wood