I'm getting duplicates when I do two LEFT JOINs to get to the "event_name" in my example below. I get 112 cases with it set up this way. However, if I get rid of the 2 LEFT JOIN lines and run the query, I get the proper 100 records without duplicates. I tried DISTINCT with the code below, but I still get 112 with duplicates.
SELECT "cases"."id", "cases"."date", "cases"."name", "event"."event_name"
FROM "cases"
LEFT JOIN "middle_table" ON "cases"."serial" = "middle_table"."m_serial"
LEFT JOIN "event" ON "middle_table"."e_serial" = "event"."ev_serial"
WHERE "cases"."date" BETWEEN '2012-12-11' AND '2012-12-13'
How can I specify that I only want the exact 100 cases from "cases", and that I don't want anything from the tables in the joins to produce any more rows?
Thanks!
You need to extend your ON clauses to include a condition so that for each entry in cases
there is only one entry in middle_table
that matches the condition and that for each entry in middle_table
there is only one entry in event
:
LEFT JOIN middle_table ON cases.serial = middle_table.m_serial AND some_condition
You can of course use DISTINCT. If that doesn't work it means that your results are all different in the fields cases.id
, cases.date
, cases.name
and event.event_name
. Examine the results and decide which of the entries you want to throw away and include that condition in your ON clause.
The issue is you have multiple matches in the tables you're left joining with. Effectively your code says:
select *
from parent
left outer join child on parent.id = child.parentId
If a parent has two children, you get both; so the parent appears twice.
If you want to only get the parent once you need to compromise; you can't have both children. Either perform an aggregate function on columns from the child table and do a group by on columns from the parent table, or use rownumber() over partition by (list,of,parent,columns order by list,of,child,columns) r
in an inner statement and where r=1
in an outer statement, such as below:
select p.id, p.name, max(c.id), max(c.name) --nb: child id and name may come from different records
from parent p
left outer join child c on parent.id = child.parentId
group by p.id, p.name
or
select *
from
(
select p.id, p.name, c.id, c.name
, rownumber() over (partition by p.id order by c.id desc) r
from parent p
left outer join child c on parent.id = child.parentId
) x
where x.r = 1
UPDATE
As mentioned in the comments, if the child data is exactly the same you can do this:
select p.id, p.name, c.name
from parent p
left outer join
(
select distinct c.parentId, c.name
from child
) c on parent.id = child.parentId
or (if a few fields are different but you don't care which you get)
select p.id, p.name, c.id, c.name
from parent p
left outer join
(
select max(c.id) id, c.parentId, c.name
from child
group by c.parentId, c.name
) c on parent.id = child.parentId
The duplicates are the result of having multiple fields for "middle_table" and "event" for "cases". You can limit the selections to the values that are unique by using the "GROUP BY" keyword (which is usually used for collating functions, such as COUNT and SUM), as follows:
SELECT "cases"."id", "cases"."date", "cases"."name", "event"."event_name"
FROM "cases"
LEFT JOIN "middle_table" ON "cases"."serial" = "middle_table"."m_serial"
LEFT JOIN "event" ON "middle_table"."e_serial" = "event"."ev_serial"
GROUP BY "cases"."id", "cases"."date", "cases"."name", "event"."event_name"
WHERE "cases"."date" BETWEEN '2012-12-11' AND '2012-12-13'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With