Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL group by and max returns wrong rows

I have two tables and I try to find the "post" with the highest score per day.

CREATE TABLE IF NOT EXISTS `posts_points` (
  `post_id` int(10) unsigned NOT NULL,
  `comments` smallint(5) unsigned NOT NULL,
  `likes` smallint(5) unsigned NOT NULL,
  `favorites` smallint(5) unsigned NOT NULL,
   PRIMARY KEY (`post_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;


CREATE TABLE IF NOT EXISTS `posts` (
  `profile_id` int(10) unsigned NOT NULL,
  `post_id` int(10) unsigned NOT NULL,
  `pubdate_utc` datetime NOT NULL,
  PRIMARY KEY (`post_id`),
  KEY `profile_id` (`profile_id`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;

I have tried the query below. It returns the correct score but the other columns are just random rows. What am I doing wrong ?

SELECT p.post_id, p.profile_id
   , MAX(t1.score)
   , DATE_FORMAT(t1.pubdate_utc, '%d %b') post_date
   , DATE(t1.pubdate_utc) mydate
FROM
(
   SELECT p.profile_id, p.post_id, p.pubdate_utc
      , (pp.comments + pp.likes + pp.favorites) AS score
   FROM posts p 
   INNER JOIN posts_points pp ON p.post_id = pp.post_id
) t1
INNER JOIN posts p ON t1.post_id = p.post_id
   AND t1.pubdate_utc = p.pubdate_utc
GROUP BY mydate
ORDER BY mydate DESC
LIMIT 18;
like image 863
user1070125 Avatar asked Nov 28 '11 21:11

user1070125


People also ask

Does limit work with GROUP BY?

No, you can't LIMIT subqueries arbitrarily (you can do it to a limited extent in newer MySQLs, but not for 5 results per group). This is a groupwise-maximum type query, which is not trivial to do in SQL.

Which row does GROUP BY return?

The GROUP BY is an optional clause of the SELECT statement. The GROUP BY clause allows you to group rows based on values of one or more columns. It returns one row for each group.

Does GROUP BY have to be in SELECT?

Answer. No, you can GROUP BY a column that was not included in the SELECT statement. For example, this query does not list the price column in the SELECT , but it does group the data by that column.


1 Answers

I run into this problem all the time. When MySQL runs an aggregate function, for any non-aggregated columns, it simply pulls the first data it runs across for that group, whether it is from the MAX row or not. So what you have to do is order the data in an inner query such that the maxes are first in their groups. See if this works for you:

SELECT t.post_id,
       t.profile_id,
       t.score,
       t.pubdate_utc
FROM (SELECT p.profile_id,
             p.post_id,
             p.pubdate_utc,
             (pp.comments + pp.likes + pp.favorites) score
      FROM posts p
      JOIN posts_points pp ON p.post_id = pp.post_id
      WHERE p.pubdate_utc >= DATE_ADD(DATE(NOW()), INTERVAL -17 DAY)
      ORDER BY score DESC
     ) t
GROUP BY DATE(t.pubdate_utc) DESC
;

Notice that I use no MAX function here. Ordering by score descending and then grouping by date in the outer query will pull up the highest score by date. Also notice that I put the WHERE clause in the inner query. Inner queries like this (tho sometimes necessary) are not very efficient, since they have no indexes for the outer query to optimize on, so make sure your inner result set is as small as it can be. Lastly, notice the GROUP BY DATE(t.pubdate_utc). If I did not reduce it down to just the date information, there would be a lot more than 18 results, as times are also counted then.

Edit: Changed to INTERVAL -17 DAY to give up to 18 results instead of 19.

like image 131
Kasey Speakman Avatar answered Oct 02 '22 12:10

Kasey Speakman