I basically have a table that holds counts for every date. I want to create a query that gives me the total # of counts over the entire table, as well as the total for yesterday. But when I try to join the table twice, the aggregates are off. Below is how you can replicate the results.
CREATE TABLE a (id int primary key);
CREATE TABLE b (a_id int, b_id int, date date, count int, primary key (a_id,b_id,date));
INSERT INTO a VALUES (1);
INSERT INTO b VALUES (1, 1, UTC_DATE(), 5);
INSERT INTO b VALUES (1, 2, UTC_DATE(), 10);
INSERT INTO b VALUES (1, 1, UTC_DATE()-1, 7);
INSERT INTO b VALUES (1, 2, UTC_DATE()-1, 12);
SELECT A.id,SUM(B.count) AS total_count,SUM(Y.count) AS y FROM a AS A
LEFT JOIN b AS B ON (B.a_id=A.id)
LEFT JOIN b AS Y ON (Y.a_id=A.id AND Y.date=UTC_DATE()-1)
GROUP BY A.id;
Results in:
+----+-------------+------+
| id | total_count | y |
+----+-------------+------+
| 1 | 68 | 76 |
+----+-------------+------+
The correct result should be:
+----+-------------+------+
| id | total_count | y |
+----+-------------+------+
| 1 | 34 | 22 |
+----+-------------+------+
What's going on here? Is this a bug in mysql or am I not understanding how the joins are working.
No, it's not a bug in MySQL.
Your JOIN conditions are generating "duplicate" rows. (Remove the aggregate functions and the GROUP BY, and you'll see what's happening.
That row from table "a" is matching four rows from table "b". That's all fine and good. But when you add the join to the third table ("y"), each row returned from that third "y" table (two rows) is being "matched" to every row from the "b" table... so you wind up with a total of eight rows in your result set. (That's why the "total_count" is getting doubled.)
To get the result set you specify, you don't need to join that table "b" second time. Instead, just use a conditional test to determine whether that "count" should be included in the "y" total or not.
e.g.
SELECT a.id
, SUM(b.count) AS total_count
, SUM(IF(b.date=UTC_DATE()-1 ,b.count,0)) AS y
FROM a a
LEFT
JOIN b b ON (b.a_id=a.id)
GROUP BY a.id;
Note that the MySQL IF
expression can be replaced with an equivalent ANSI CASE
expression for improved portability:
, SUM(CASE WHEN b.date=UTC_DATE()-1 THEN b.count ELSE 0 END) AS y
If you did want to do JOIN to that "b" table a second time, you would want the JOIN condition to be such that a row from "y" would match, at most, ONE row from "b", so as not to introduce any duplicates. So you'd basically need the join condition to include all of the columns in the primary key.
(Note that the predicates in the join condition for table "y" guarantee that each from from "y" will match no more than ONE row from "b"):
SELECT a.id
, SUM(b.count) AS total_count
, SUM(y.count) AS y
FROM a a
LEFT
JOIN b b
ON b.a_id=a.id
LEFT
JOIN b y
ON y.a_id = b.a_id
AND y.b_id = b.b_id
AND y.date = b.date
AND y.date = UTC_DATE()-1
GROUP BY a.id;
(To get the first statement to return an identical resultset, with a potential NULL in place of a zero, you'd need to replace the '0' constant in the IF expression with 'NULL'.
, SUM(IF(b.date=UTC_DATE()-1 ,b.count,NULL)) AS y
SELECT A.id,b_count AS total_count,y_count as y
FROM a AS A
LEFT JOIN (select a_id,SUM(B.Count) b_count from b
group by B.A_id) AS B1 ON (B1.a_id=A.id)
LEFT JOIN (select a_id,SUM(Count) y_count from b
where date=UTC_DATE()-1
group by B.A_id) AS Y ON (Y.a_id=A.id)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With