I faced a situation where I got duplicate values from <code>LEFT JOIN</code>. I think this might be a desired behavior but unlike from what I want. I have three tables: <code>person</code>, <code>department</code> and <code>contact</code>. person : <pre class="prettyprint"><code>id bigint, person_name character varying(255) </code></pre> department : <pre class="prettyprint"><code>person_id bigint, department_name character varying(255) </code></pre> contact : <pre class="prettyprint"><code>person_id bigint, phone_number character varying(255) </code></pre> Sql Query : <pre class="prettyprint"><code>SELECT p.id, p.person_name, d.department_name, c.phone_number FROM person p LEFT JOIN department d ON p.id = d.person_id LEFT JOIN contact c ON p.id = c.person_id; </code></pre> Result : <pre class="prettyprint"><code>id|person_name|department_name|phone_number --+-----------+---------------+------------ 1 |"John" |"Finance" |"023451" 1 |"John" |"Finance" |"99478" 1 |"John" |"Finance" |"67890" 1 |"John" |"Marketing" |"023451" 1 |"John" |"Marketing" |"99478" 1 |"John" |"Marketing" |"67890" 2 |"Barbara" |"Finance" |"" 3 |"Michelle" |"" |"005634" </code></pre> I know it's what joins do, keeping multiplied with selected rows. But It gives a sense like phone numbers <code>023451</code>,<code>99478</code>,<code>67890</code> are for both departments while they are only related to person john with unnecessary repeated values which will escalate the problem with larger data set. So, here is what I want: <pre class="prettyprint"><code>id|person_name|department_name|phone_number --+-----------+---------------+------------ 1 |"John" |"Finance" |"023451" 1 |"John" |"Marketing" |"99478" 1 |"John" |"" |"67890" 2 |"Barbara" |"Finance" |"" 3 |"Michelle" |"" |"005634" </code></pre> This is a sample of my situation and I am using a large set of tables and queries. So, kind of need a generic solution.

I like to call this problem "cross join by proxy". Since there is no information (<code>WHERE</code> or <code>JOIN</code> condition) how the tables <code>department</code> and <code>contact</code> are supposed to match up, they are cross-joined via the proxy table <code>person</code> - giving you the Cartesian product. Very similar to this one: <ul> <li>Two SQL LEFT JOINS produce incorrect result</li> </ul> More explanation there. Solution for your query: <pre class="prettyprint"><code>SELECT p.id, p.person_name, d.department_name, c.phone_number FROM person p LEFT JOIN ( SELECT person_id, min(department_name) AS department_name FROM department GROUP BY person_id ) d ON d.person_id = p.id LEFT JOIN ( SELECT person_id, min(phone_number) AS phone_number FROM contact GROUP BY person_id ) c ON c.person_id = p.id; </code></pre> You did not define which department or phone number to pick, so I arbitrarily chose the minimum. You can have it any other way ...

I think you just need to get lists of departments and phones for particular person. So just use <code>array_agg</code> (or <code>string_agg</code> or <code>json_agg</code>): <pre class="prettyprint lang-sql prettyprint-override"><code>SELECT p.id, p.person_name, array_agg(d.department_name) as "department_names", array_agg(c.phone_number) as "phone_numbers" FROM person AS p LEFT JOIN department AS d ON p.id = d.person_id LEFT JOIN contact AS c on p.id = c.person_id GROUP BY p.id, p.person_name </code></pre>

Although the tables are obviously simplified for discussion, it appears they are structurally flawed. Tables should be structured to show relationships between entities, rather than be merely lists of entities and/or attributes. And I would consider a phone number to be an attribute (of a person or department entity) in this case. The first step would be to create tables with relationships, each having a primary key and possibly a foreign key. In this example, it would be helpful to have the person table use person_id for the primary key, and the department table to use department_id for its primary key. Next look for one-to-many or many-to-many relationships, and set your foreign keys accordingly: <ul> <li>If one person can only be in one department at a time, then you have a one(department)-to-many(persons). No foreign key in the department table, but department_id will be a foreign key in the persons table.</li> <li>If one person can be in more than one department, they you have a many-to-many, and you'll need an additional junction table with person_id and department_id as foreign keys.</li> </ul> To summarize, there should only be two tables in your scenario: one table for the person and the other table for the department. Even allowing for personal phone numbers (a column in the persons table) and department numbers in the department table, this would be a better approach. The only caveat is when one department has many numbers (or more than one department shares a single phone number), but this would be beyond the scope of the original question.

Prevent duplicate values in LEFT JOIN

Tags:

sql

join

I faced a situation where I got duplicate values from LEFT JOIN. I think this might be a desired behavior but unlike from what I want.

I have three tables: person, department and contact.

person :

id bigint,
person_name character varying(255)

department :

person_id bigint,
department_name character varying(255)

contact :

person_id bigint,
phone_number character varying(255)

Sql Query :

SELECT p.id, p.person_name, d.department_name, c.phone_number 
FROM person p
  LEFT JOIN department d 
    ON p.id = d.person_id
  LEFT JOIN contact c 
    ON p.id = c.person_id;

Result :

id|person_name|department_name|phone_number
--+-----------+---------------+------------
1 |"John"     |"Finance"      |"023451"
1 |"John"     |"Finance"      |"99478"
1 |"John"     |"Finance"      |"67890"
1 |"John"     |"Marketing"    |"023451"
1 |"John"     |"Marketing"    |"99478"
1 |"John"     |"Marketing"    |"67890"
2 |"Barbara"  |"Finance"      |""
3 |"Michelle" |""             |"005634"

I know it's what joins do, keeping multiplied with selected rows. But It gives a sense like phone numbers 023451,99478,67890 are for both departments while they are only related to person john with unnecessary repeated values which will escalate the problem with larger data set.
So, here is what I want:

id|person_name|department_name|phone_number
--+-----------+---------------+------------
1 |"John"     |"Finance"      |"023451"
1 |"John"     |"Marketing"    |"99478"
1 |"John"     |""             |"67890"
2 |"Barbara"  |"Finance"      |""
3 |"Michelle" |""             |"005634"

This is a sample of my situation and I am using a large set of tables and queries. So, kind of need a generic solution.

974

asked May 23 '15 08:05

Gautam Kumar Samal

4 Answers

I like to call this problem "cross join by proxy". Since there is no information (WHERE or JOIN condition) how the tables department and contact are supposed to match up, they are cross-joined via the proxy table person - giving you the Cartesian product. Very similar to this one:

Two SQL LEFT JOINS produce incorrect result

More explanation there.

Solution for your query:

SELECT p.id, p.person_name, d.department_name, c.phone_number
FROM   person p
LEFT   JOIN (
   SELECT person_id, min(department_name) AS department_name
   FROM   department
   GROUP  BY person_id
   ) d ON d.person_id = p.id
LEFT   JOIN (
   SELECT person_id, min(phone_number) AS phone_number
   FROM   contact
   GROUP  BY person_id
   ) c ON c.person_id = p.id;

You did not define which department or phone number to pick, so I arbitrarily chose the minimum. You can have it any other way ...

answered Sep 24 '22 12:09

Erwin Brandstetter

I think you just need to get lists of departments and phones for particular person. So just use array_agg (or string_agg or json_agg):

SELECT
    p.id,
    p.person_name,
    array_agg(d.department_name) as "department_names",
    array_agg(c.phone_number) as "phone_numbers"
FROM person AS p
LEFT JOIN department AS d ON p.id = d.person_id
LEFT JOIN contact AS c on p.id = c.person_id
GROUP BY p.id, p.person_name

answered Sep 20 '22 12:09

alexpods

Although the tables are obviously simplified for discussion, it appears they are structurally flawed. Tables should be structured to show relationships between entities, rather than be merely lists of entities and/or attributes. And I would consider a phone number to be an attribute (of a person or department entity) in this case.

The first step would be to create tables with relationships, each having a primary key and possibly a foreign key. In this example, it would be helpful to have the person table use person_id for the primary key, and the department table to use department_id for its primary key. Next look for one-to-many or many-to-many relationships, and set your foreign keys accordingly:

If one person can only be in one department at a time, then you have a one(department)-to-many(persons). No foreign key in the department table, but department_id will be a foreign key in the persons table.
If one person can be in more than one department, they you have a many-to-many, and you'll need an additional junction table with person_id and department_id as foreign keys.

To summarize, there should only be two tables in your scenario: one table for the person and the other table for the department. Even allowing for personal phone numbers (a column in the persons table) and department numbers in the department table, this would be a better approach.

The only caveat is when one department has many numbers (or more than one department shares a single phone number), but this would be beyond the scope of the original question.

answered Sep 24 '22 12:09

KiloVoltaire

Use this type of query: SQL Server
(You can change id of ORDER BY id to each column that you want it)

SELECT 
    p.id, 
    p.person_name, 
    d.department_name, 
    c.phone_number
FROM
    person p
    LEFT JOIN 
    (SELECT *, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY id) AS seq
     FROM department) d 
    ON d.person_id = p.id And d.seq = 1
    LEFT JOIN 
    ( SELECT *, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY id) AS seq
     FROM contact) c 
    ON c.person_id = p.id And c.seq = 1;

answered Sep 22 '22 12:09

shA.t

Related questions
                            
                                SQL problem with error "Invalid data type"
                            
                                How do I delete orphan entities using hibernate and JPA on a many-to-many relationship?
                            
                                SQL query logging for SQLite?
                            
                                Why can't I perform an aggregate function on an expression containing an aggregate but I can do so by creating a new select statement around it?
                            
                                Easiest way to eliminate NULLs in SELECT DISTINCT?
                            
                                Selecting both MIN and MAX From the Table is slower than expected
                            
                                Curious issue with Oracle UNION and ORDER BY
                            
                                SQL: Select most recent date for each category
                            
                                Postgresql prefix wildcard for full text
                            
                                BigQuery Date-Partitioned Views
                            
                                Delete rows from multiple tables using a single query (SQL Express 2005) with a WHERE condition
                            
                                Creating temporary tables in MySQL Stored Procedure
                            
                                How to pass datetime from c# to sql correctly?
                            
                                Are Sql Triggers synchronous or asynchronous?
                            
                                Indexed View vs Indexes on Table
                            
                                Select distinct values from 1 column
                            
                                Automatically Generate SQL from existing MS Access table
                            
                                Is substr or LIKE faster in Oracle?
                            
                                Apply like over all columns without specifying all column names?
                            
                                JOOQ vs Hibernate [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With