I faced a situation where I got duplicate values from LEFT JOIN
. I think this might be a desired behavior but unlike from what I want.
I have three tables: person
, department
and contact
.
person :
id bigint,
person_name character varying(255)
department :
person_id bigint,
department_name character varying(255)
contact :
person_id bigint,
phone_number character varying(255)
Sql Query :
SELECT p.id, p.person_name, d.department_name, c.phone_number
FROM person p
LEFT JOIN department d
ON p.id = d.person_id
LEFT JOIN contact c
ON p.id = c.person_id;
Result :
id|person_name|department_name|phone_number
--+-----------+---------------+------------
1 |"John" |"Finance" |"023451"
1 |"John" |"Finance" |"99478"
1 |"John" |"Finance" |"67890"
1 |"John" |"Marketing" |"023451"
1 |"John" |"Marketing" |"99478"
1 |"John" |"Marketing" |"67890"
2 |"Barbara" |"Finance" |""
3 |"Michelle" |"" |"005634"
I know it's what joins do, keeping multiplied with selected rows. But It gives a sense like phone numbers 023451
,99478
,67890
are for both departments while they are only related to person john with unnecessary repeated values which will escalate the problem with larger data set.
So, here is what I want:
id|person_name|department_name|phone_number
--+-----------+---------------+------------
1 |"John" |"Finance" |"023451"
1 |"John" |"Marketing" |"99478"
1 |"John" |"" |"67890"
2 |"Barbara" |"Finance" |""
3 |"Michelle" |"" |"005634"
This is a sample of my situation and I am using a large set of tables and queries. So, kind of need a generic solution.
Avoiding Duplicates Again, if we perform a left outer join where date = date, each row from Table 5 will join on to every matching row from Table 4. However, in this case, the join will result in 4 rows of duplicate dates in the joined DataSet (see Table 6).
The SQL UNION ALL operator is used to combine the result sets of 2 or more SELECT statements. It does not remove duplicate rows between the various SELECT statements (all rows are returned). Each SELECT statement within the UNION ALL must have the same number of fields in the result sets with similar data types.
I like to call this problem "cross join by proxy". Since there is no information (WHERE
or JOIN
condition) how the tables department
and contact
are supposed to match up, they are cross-joined via the proxy table person
- giving you the Cartesian product. Very similar to this one:
More explanation there.
Solution for your query:
SELECT p.id, p.person_name, d.department_name, c.phone_number
FROM person p
LEFT JOIN (
SELECT person_id, min(department_name) AS department_name
FROM department
GROUP BY person_id
) d ON d.person_id = p.id
LEFT JOIN (
SELECT person_id, min(phone_number) AS phone_number
FROM contact
GROUP BY person_id
) c ON c.person_id = p.id;
You did not define which department or phone number to pick, so I arbitrarily chose the minimum. You can have it any other way ...
I think you just need to get lists of departments and phones for particular person. So just use array_agg
(or string_agg
or json_agg
):
SELECT
p.id,
p.person_name,
array_agg(d.department_name) as "department_names",
array_agg(c.phone_number) as "phone_numbers"
FROM person AS p
LEFT JOIN department AS d ON p.id = d.person_id
LEFT JOIN contact AS c on p.id = c.person_id
GROUP BY p.id, p.person_name
Although the tables are obviously simplified for discussion, it appears they are structurally flawed. Tables should be structured to show relationships between entities, rather than be merely lists of entities and/or attributes. And I would consider a phone number to be an attribute (of a person or department entity) in this case.
The first step would be to create tables with relationships, each having a primary key and possibly a foreign key. In this example, it would be helpful to have the person table use person_id for the primary key, and the department table to use department_id for its primary key. Next look for one-to-many or many-to-many relationships, and set your foreign keys accordingly:
To summarize, there should only be two tables in your scenario: one table for the person and the other table for the department. Even allowing for personal phone numbers (a column in the persons table) and department numbers in the department table, this would be a better approach.
The only caveat is when one department has many numbers (or more than one department shares a single phone number), but this would be beyond the scope of the original question.
Use this type of query: SQL Server
(You can change id
of ORDER BY id
to each column that you want it)
SELECT
p.id,
p.person_name,
d.department_name,
c.phone_number
FROM
person p
LEFT JOIN
(SELECT *, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY id) AS seq
FROM department) d
ON d.person_id = p.id And d.seq = 1
LEFT JOIN
( SELECT *, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY id) AS seq
FROM contact) c
ON c.person_id = p.id And c.seq = 1;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With