Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MYSQL shows incorrect rows when using GROUP BY

I have two tables:

article('id', 'ticket_id', 'incoming_time', 'to', 'from', 'message')
ticket('id', 'queue_id')

where tickets represent a thread of emails between support staff and customers, and articles are the individual messages that compose a thread.

I'm looking to find the article with the highest incoming time (expressed as a unix timestamp) for each ticket_id, and this is the query I'm currently using:

SELECT article.* , MAX(article.incoming_time) as maxtime
FROM ticket, article
WHERE ticket.id = article.ticket_id
AND ticket.queue_id = 1
GROUP BY article.ticket_id

For example,

:article:
id --- ticket_id --- incoming_time --- to ------- from ------- message --------
11     1             1234567           help@      client@      I need help...   
12     1             1235433           client@    help@        How can we help?
13     1             1240321           help@      client@      Want food!    
...

:ticket:
id --- queue_id
1      1
...

But the result looks to be the row with the smallest article id instead of what I'm looking for which is the article with the highest incoming time.

Any advice would be greatly appreciated!

like image 218
Han Avatar asked Nov 30 '22 12:11

Han


2 Answers

This is a classic hurdle that most MySQL programmers bump into.

  • You have a column ticket_id that is the argument to GROUP BY. Distinct values in this column define the groups.
  • You have a column incoming_time that is the argument to MAX(). The greatest value in this column over the rows in each group is returned as the value of MAX().
  • You have all other columns of table article. The values returned for these columns are arbitrary, not from the same row where the MAX() value occurs.

The database cannot infer that you want values from the same row where the max value occurs.

Think about the following cases:

  • There are multiple rows where the same max value occurs. Which row should be used to show the columns of article.*?

  • You write a query that returns both the MIN() and the MAX(). This is legal, but which row should article.* show?

    SELECT article.* , MIN(article.incoming_time), MAX(article.incoming_time)
    FROM ticket, article
    WHERE ticket.id = article.ticket_id
    AND ticket.queue_id = 1
    GROUP BY article.ticket_id
    
  • You use an aggregate function such as AVG() or SUM(), where no row has that value. How is the database to guess which row to display?

    SELECT article.* , AVG(article.incoming_time)
    FROM ticket, article
    WHERE ticket.id = article.ticket_id
    AND ticket.queue_id = 1
    GROUP BY article.ticket_id
    

In most brands of database -- as well as the SQL standard itself -- you aren't allowed to write a query like this, because of the ambiguity. You can't include any column in the select-list that isn't inside an aggregate function or named in the GROUP BY clause.

MySQL is more permissive. It lets you do this, and leaves it up to you to write queries without ambiguity. If you do have ambiguity, it selects values from the row that is physically first in the group (but this is up to the storage engine).

For what it's worth, SQLite also has this behavior, but it chooses the last row in the group to resolve the ambiguity. Go figure. If the SQL standard doesn't say what to do, it's up to the vendor implementation.

Here's a query that can solve your problem for you:

SELECT a1.* , a1.incoming_time AS maxtime
FROM ticket t JOIN article a1 ON (t.id = a1.ticket_id)
LEFT OUTER JOIN article a2 ON (t.id = a2.ticket_id 
  AND a1.incoming_time < a2.incoming_time)
WHERE t.queue_id = 1
  AND a2.ticket_id IS NULL;

In other words, look for a row (a1) for which there is no other row (a2) with the same ticket_id and a greater incoming_time. If no greater incoming_time is found, the LEFT OUTER JOIN returns NULL instead of a match.

like image 140
Bill Karwin Avatar answered Dec 04 '22 07:12

Bill Karwin


SELECT a1.* FROM article a1 
JOIN 
  (SELECT MAX(a2.incoming_time) AS maxtime
   FROM article a2
   JOIN ticket ON (a2.ticketid=ticket.id)
   WHERE ticket.queue_id=1) xx
  ON (a1.incoming_time=xx.maxtime);
like image 35
Alex Martelli Avatar answered Dec 04 '22 09:12

Alex Martelli