Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL duplicate rows with multiple left joins

I'm currently facing the following problem: I have 3 tables I need information from and both of these joins are one to many. For some reason second join creates duplicates of rows and as a result second return value gets messed up (bb.count gets multiplied by the amount of second join rows)

SELECT aa.id, sum(bb.count), count(DISTINCT cc.id)
FROM aaaa aa
LEFT JOIN bbbb bb ON bb.aa_id = aa.id
LEFT JOIN cccc cc ON cc.bb_id = bb.id
GROUP BY aa.id

Is there a way to get the proper sum of bb.count without another query? The moment I remove second left join everything's fine, unfortunately I need it for the third return value and I can't group them without resulting in a duplicate (sort of) rows in result.

Lets say there's

bb1.count = 9
bb2.count = 5

And there's 2 rows where cc.bb_id = bb1.id The result I get is 23 instead of 14.

like image 484
Rauno Avatar asked Sep 18 '17 12:09

Rauno


People also ask

How do I prevent duplicate rows from joining multiple tables?

Solution. Select column values in a specific order within rows to make rows with duplicate sets of values identical. Then you can use SELECT DISTINCT to remove duplicates. Alternatively, retrieve rows in such a way that near-duplicates are not even selected.

Does left outer join remove duplicates?

Avoiding Duplicates Again, if we perform a left outer join where date = date, each row from Table 5 will join on to every matching row from Table 4. However, in this case, the join will result in 4 rows of duplicate dates in the joined DataSet (see Table 6).

Does inner join remove duplicates?

Yes, if there are duplicate values.


1 Answers

You're experiencing aggregate fanout in the above query.

This happens because there is

  • either a 1-1 or 1-N join between aaa & bbb
  • there is a 1-N join between bbb & ccc

The latter join creates M duplicates for rows that exist in bbb if they are joined to M rows via the join to ccc

To fix the error, split the query up into two CTEs & join the result.

WITH agg_bb AS (
SELECT aa.id, sum(bb.count)
FROM aaaa aa
LEFT JOIN bbbb bb ON bb.aa_id = aa.id
GROUP BY aa.id
)
, agg_cc AS (SELECT aa.id, count(DISTINCT cc.id)
FROM aaaa aa
LEFT JOIN bbbb bb ON bb.aa_id = aa.id
LEFT JOIN cccc cc ON cc.bb_id = bb.id
GROUP BY aa.id
)
SELECT * FROM agg_bb JOIN agg_cc USING (id)

In general, to avoid fan outs, only apply aggregate operations to the columns from the rightmost relation in a series of joins. If you find you're aggregating columns from the middle tables, split the query up as I have done above. Only the following functions are invariant across a fan out: COUNT DISTINCT, MIN, MAX

like image 133
Haleemur Ali Avatar answered Oct 27 '22 04:10

Haleemur Ali