Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid duplicates in sql query across three joined tables

Tags:

sql

I'm getting duplicates when I do two LEFT JOINs to get to the "event_name" in my example below. I get 112 cases with it set up this way. However, if I get rid of the 2 LEFT JOIN lines and run the query, I get the proper 100 records without duplicates. I tried DISTINCT with the code below, but I still get 112 with duplicates.

SELECT "cases"."id", "cases"."date", "cases"."name", "event"."event_name" 
FROM "cases"
LEFT JOIN "middle_table" ON "cases"."serial" = "middle_table"."m_serial"
LEFT JOIN "event" ON "middle_table"."e_serial" = "event"."ev_serial"
WHERE "cases"."date" BETWEEN '2012-12-11' AND '2012-12-13'

How can I specify that I only want the exact 100 cases from "cases", and that I don't want anything from the tables in the joins to produce any more rows?

Thanks!

like image 828
Chain Avatar asked Dec 19 '12 00:12

Chain


3 Answers

You need to extend your ON clauses to include a condition so that for each entry in cases there is only one entry in middle_table that matches the condition and that for each entry in middle_table there is only one entry in event:

LEFT JOIN middle_table ON cases.serial = middle_table.m_serial AND some_condition

You can of course use DISTINCT. If that doesn't work it means that your results are all different in the fields cases.id, cases.date, cases.name and event.event_name. Examine the results and decide which of the entries you want to throw away and include that condition in your ON clause.

like image 124
AndreKR Avatar answered Nov 17 '22 11:11

AndreKR


The issue is you have multiple matches in the tables you're left joining with. Effectively your code says:

select *
from parent
left outer join child on parent.id = child.parentId

If a parent has two children, you get both; so the parent appears twice.

If you want to only get the parent once you need to compromise; you can't have both children. Either perform an aggregate function on columns from the child table and do a group by on columns from the parent table, or use rownumber() over partition by (list,of,parent,columns order by list,of,child,columns) r in an inner statement and where r=1 in an outer statement, such as below:

select p.id, p.name, max(c.id), max(c.name) --nb: child id and name may come from different records
from parent p
left outer join child c on parent.id = child.parentId
group by p.id, p.name

or

select *
from 
(
    select p.id, p.name, c.id, c.name
    , rownumber() over (partition by p.id order by c.id desc) r
    from parent p
    left outer join child c on parent.id = child.parentId
) x
where x.r = 1

UPDATE

As mentioned in the comments, if the child data is exactly the same you can do this:

select p.id, p.name, c.name
from parent p
left outer join 
(
    select distinct c.parentId, c.name
    from child
) c on parent.id = child.parentId

or (if a few fields are different but you don't care which you get)

select p.id, p.name, c.id, c.name
from parent p
left outer join 
(
    select max(c.id) id, c.parentId, c.name
    from child
    group by c.parentId, c.name
) c on parent.id = child.parentId
like image 21
JohnLBevan Avatar answered Nov 17 '22 11:11

JohnLBevan


The duplicates are the result of having multiple fields for "middle_table" and "event" for "cases". You can limit the selections to the values that are unique by using the "GROUP BY" keyword (which is usually used for collating functions, such as COUNT and SUM), as follows:

SELECT "cases"."id", "cases"."date", "cases"."name", "event"."event_name" 
FROM "cases"
LEFT JOIN "middle_table" ON "cases"."serial" = "middle_table"."m_serial"
LEFT JOIN "event" ON "middle_table"."e_serial" = "event"."ev_serial"
GROUP BY  "cases"."id", "cases"."date", "cases"."name", "event"."event_name" 
WHERE "cases"."date" BETWEEN '2012-12-11' AND '2012-12-13'
like image 1
Michael Durrant Avatar answered Nov 17 '22 11:11

Michael Durrant