I'm currently facing the following problem: I have 3 tables I need information from and both of these joins are one to many. For some reason second join creates duplicates of rows and as a result second return value gets messed up (bb.count gets multiplied by the amount of second join rows)
SELECT aa.id, sum(bb.count), count(DISTINCT cc.id)
FROM aaaa aa
LEFT JOIN bbbb bb ON bb.aa_id = aa.id
LEFT JOIN cccc cc ON cc.bb_id = bb.id
GROUP BY aa.id
Is there a way to get the proper sum of bb.count without another query? The moment I remove second left join everything's fine, unfortunately I need it for the third return value and I can't group them without resulting in a duplicate (sort of) rows in result.
Lets say there's
bb1.count = 9
bb2.count = 5
And there's 2 rows where cc.bb_id = bb1.id
The result I get is 23 instead of 14.
Solution. Select column values in a specific order within rows to make rows with duplicate sets of values identical. Then you can use SELECT DISTINCT to remove duplicates. Alternatively, retrieve rows in such a way that near-duplicates are not even selected.
Avoiding Duplicates Again, if we perform a left outer join where date = date, each row from Table 5 will join on to every matching row from Table 4. However, in this case, the join will result in 4 rows of duplicate dates in the joined DataSet (see Table 6).
Yes, if there are duplicate values.
You're experiencing aggregate fanout in the above query.
This happens because there is
aaa
& bbb
bbb
& ccc
The latter join creates M
duplicates for rows that exist in bbb
if they are joined to M rows via the join to ccc
To fix the error, split the query up into two CTEs & join the result.
WITH agg_bb AS (
SELECT aa.id, sum(bb.count)
FROM aaaa aa
LEFT JOIN bbbb bb ON bb.aa_id = aa.id
GROUP BY aa.id
)
, agg_cc AS (SELECT aa.id, count(DISTINCT cc.id)
FROM aaaa aa
LEFT JOIN bbbb bb ON bb.aa_id = aa.id
LEFT JOIN cccc cc ON cc.bb_id = bb.id
GROUP BY aa.id
)
SELECT * FROM agg_bb JOIN agg_cc USING (id)
In general, to avoid fan outs, only apply aggregate operations to the columns from the rightmost relation in a series of joins. If you find you're aggregating columns from the middle tables, split the query up as I have done above. Only the following functions are invariant across a fan out: COUNT DISTINCT
, MIN
, MAX
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With