Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to prevent duplicates with inner join query (Postgres)

I am trying to understand how to create a query to filter out some results based on an inner join.

Consider the following data:

formulation_batch
-----
id  project_id  name    
1   1           F1.1
2   1           F1.2
3   1           F1.3
4   1           F1.all

formulation_batch_component
-----
id  formulation_batch_id    component_id
1   1                       1
2   2                       2
3   3                       3
4   4                       1
5   4                       2
6   4                       3
7   4                       4

I would like to select all formulation_batch records with a project_id of 1, and has a formulation_batch_component with a component_id of 1 or 2. So I run the following query:

SELECT formulation_batch.* 
FROM formulation_batch 
INNER JOIN formulation_batch_component
ON formulation_batch.id = formulation_batch_component.formulation_batch_id
WHERE formulation_batch.project_id = 1 
    AND ((formulation_batch_component.component_id = 2 
        OR formulation_batch_component.component_id = 1 ))

However, this returns a duplicate entry:

1;"F1.1"
2;"F1.2"
4;"F1.all"
4;"F1.all"

Is there a way to modify this query so that I only get back the unique formulation_batch records which match the criteria?

EG:

1;"F1.1"
2;"F1.2"
4;"F1.all"

Thanks for your time!

like image 245
drkstr101 Avatar asked Jul 30 '13 23:07

drkstr101


People also ask

Does inner join remove duplicates?

Yes, if there are duplicate values.

How do I join two tables in SQL without duplicates?

The SQL UNION ALL operator is used to combine the result sets of 2 or more SELECT statements. It does not remove duplicate rows between the various SELECT statements (all rows are returned). Each SELECT statement within the UNION ALL must have the same number of fields in the result sets with similar data types.


2 Answers

In this case it is possible to apply the distinct before the join possibly making it more performant:

select fb.* 
from
    formulation_batch fb
    inner join
    (
        select distinct formulationbatch_id
        from formulation_batch_component
        where component_id in (1, 2)
    ) fbc on fb.id = fbc.formulationbatch_id 
where fb.project_id = 1

Notice how to use alias for the table names to make the query clearer. Also then in operator is very handy. The use of double quotes with those identifiers is not necessary.

like image 148
Clodoaldo Neto Avatar answered Oct 01 '22 03:10

Clodoaldo Neto


One way would be to use distinct:

SELECT distinct "formulation_batch".* 
FROM "formulation_batch" 
INNER JOIN "formulation_batch_component" 
ON "formulation_batch"."id" = "formulation_batch_component"."formulationBatch_id" 
WHERE "formulation_batch"."project_id" = 1 
    AND (("formulation_batch_component"."component_id" = 2 
        OR "formulation_batch_component"."component_id" = 1 ))
like image 44
Gordon Linoff Avatar answered Oct 01 '22 03:10

Gordon Linoff