Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql when I join same table twice aggregate is wrong

Tags:

join

mysql

I basically have a table that holds counts for every date. I want to create a query that gives me the total # of counts over the entire table, as well as the total for yesterday. But when I try to join the table twice, the aggregates are off. Below is how you can replicate the results.

CREATE TABLE a (id int primary key);
CREATE TABLE b (a_id int, b_id int, date date, count int, primary key (a_id,b_id,date));
INSERT INTO a VALUES (1);
INSERT INTO b VALUES (1, 1, UTC_DATE(), 5);
INSERT INTO b VALUES (1, 2, UTC_DATE(), 10);
INSERT INTO b VALUES (1, 1, UTC_DATE()-1, 7);
INSERT INTO b VALUES (1, 2, UTC_DATE()-1, 12);

SELECT A.id,SUM(B.count) AS total_count,SUM(Y.count) AS y FROM a AS A 
LEFT JOIN b AS B ON (B.a_id=A.id) 
LEFT JOIN b AS Y ON (Y.a_id=A.id AND Y.date=UTC_DATE()-1)
GROUP BY A.id;

Results in:
+----+-------------+------+
| id | total_count | y    |
+----+-------------+------+
|  1 |          68 |   76 |
+----+-------------+------+


The correct result should be:
+----+-------------+------+
| id | total_count | y    |
+----+-------------+------+
|  1 |          34 |   22 |
+----+-------------+------+

What's going on here? Is this a bug in mysql or am I not understanding how the joins are working.

like image 516
Jeremy McJunkin Avatar asked Dec 12 '12 20:12

Jeremy McJunkin


2 Answers

No, it's not a bug in MySQL.

Your JOIN conditions are generating "duplicate" rows. (Remove the aggregate functions and the GROUP BY, and you'll see what's happening.

That row from table "a" is matching four rows from table "b". That's all fine and good. But when you add the join to the third table ("y"), each row returned from that third "y" table (two rows) is being "matched" to every row from the "b" table... so you wind up with a total of eight rows in your result set. (That's why the "total_count" is getting doubled.)

To get the result set you specify, you don't need to join that table "b" second time. Instead, just use a conditional test to determine whether that "count" should be included in the "y" total or not.

e.g.

SELECT a.id
     , SUM(b.count) AS total_count
     , SUM(IF(b.date=UTC_DATE()-1 ,b.count,0)) AS y
  FROM a a
  LEFT
  JOIN b b ON (b.a_id=a.id)
 GROUP BY a.id;

Note that the MySQL IF expression can be replaced with an equivalent ANSI CASE expression for improved portability:

     , SUM(CASE WHEN b.date=UTC_DATE()-1 THEN b.count ELSE 0 END) AS y

If you did want to do JOIN to that "b" table a second time, you would want the JOIN condition to be such that a row from "y" would match, at most, ONE row from "b", so as not to introduce any duplicates. So you'd basically need the join condition to include all of the columns in the primary key.

(Note that the predicates in the join condition for table "y" guarantee that each from from "y" will match no more than ONE row from "b"):

SELECT a.id
     , SUM(b.count) AS total_count
     , SUM(y.count) AS y
  FROM a a
  LEFT
  JOIN b b
    ON b.a_id=a.id
  LEFT
  JOIN b y 
    ON y.a_id = b.a_id
   AND y.b_id = b.b_id
   AND y.date = b.date
   AND y.date = UTC_DATE()-1
 GROUP BY a.id;

(To get the first statement to return an identical resultset, with a potential NULL in place of a zero, you'd need to replace the '0' constant in the IF expression with 'NULL'.

     , SUM(IF(b.date=UTC_DATE()-1 ,b.count,NULL)) AS y
like image 91
spencer7593 Avatar answered Nov 15 '22 03:11

spencer7593


SELECT A.id,b_count AS total_count,y_count as y
FROM a AS A 
LEFT JOIN (select a_id,SUM(B.Count) b_count from b 
               group by B.A_id) AS B1 ON (B1.a_id=A.id) 
LEFT JOIN (select a_id,SUM(Count) y_count from b
               where date=UTC_DATE()-1
           group by B.A_id) AS Y ON (Y.a_id=A.id) 

SQLFiddle Demo

like image 26
valex Avatar answered Nov 15 '22 03:11

valex