Why do multiple-table joins produce duplicate rows?

Tags:

Let's say I have three tables A, B, and C. Each has two columns: a primary key and some other piece of data. They each have the same number of rows. If I JOIN A and B on the primary key, I should end up with the same number of rows as are in either of them (as opposed to A.rows * B.rows).

Now, if I JOIN A JOIN B with C, why do I end up with duplicate rows? I have run into this problem on several occasions and I do not understand it. It seems like it should produce the same result as JOINing A and B since it has the same number of rows but, instead, duplicates are produced.

Queries that produce results like this are of the format

SELECT * FROM M     INNER JOIN S         on M.mIndex = S.mIndex     INNER JOIN D         ON M.platformId LIKE '%' + D.version + '%'     INNER JOIN H         ON D.Name = H.Name         AND D.revision = H.revision

Here are schemas for the tables. H contains is a historic table containing everything that was ever in D. There are many M rows for each D and one S for each M.

Table M

    [mIndex] [int] NOT NULL PRIMARY KEY,     [platformId] [nvarchar](256) NULL,     [ip] [nvarchar](64) NULL,     [complete] [bit] NOT NULL,     [date] [datetime] NOT NULL,     [DeployId] [int] NOT NULL PRIMARY KEY REFERENCES D.DeployId,     [source] [nvarchar](64) NOT NULL PRIMARY KEY

Table S

[order] [int] NOT NULL PRIMARY KEY, [name] [nvarchar](64) NOT NULL, [parameters] [nvarchar](256) NOT NULL, [Finished] [bit] NOT NULL, [mIndex] [int] NOT NULL PRIMARY KEY, [mDeployId] [int] NOT NULL PRIMARY KEY, [Date] [datetime] NULL, [status] [nvarchar](10) NULL, [output] [nvarchar](max) NULL, [config] [nvarchar](64) NOT NULL PRIMARY KEY

Table D

[Id] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY, [branch] [nvarchar](64) NOT NULL, [revision] [int] NOT NULL, [version] [nvarchar](64) NOT NULL, [path] [nvarchar](256) NOT NULL

Table H

[IdDeploy] [int] IDENTITY(1,1) NOT NULL, [name] [nvarchar](64) NOT NULL, [version] [nvarchar](64) NOT NULL, [path] [nvarchar](max) NOT NULL, [StartDate] [datetime] NOT NULL, [EndDate] [datetime] NULL, [Revision] [nvarchar](64) NULL,

I didn't post the tables and query initially because I am more interested in understanding this problem for myself and avoiding it in the future.

522

asked May 21 '14 14:05

sirdank

1 Answers

When you have related tables you often have one-to-many or many-to-many relationships. So when you join to TableB each record in TableA many have multiple records in TableB. This is normal and expected.

Now at times you only need certain columns and those are all the same for all the records, then you would need to do some sort of group by or distinct to remove the duplicates. Let's look at an example:

TableA Id Field1 1  test 2  another test  TableB ID Field2 field3 1  Test1  something 1  test1  More something 2  Test2  Anything

So when you join them and select all the files you get:

select *  from tableA a  join tableb b on a.id = b.id  a.Id a.Field1        b.id   b.field2  b.field3 1    test            1      Test1     something 1    test            1      Test1     More something 2    another test 2  2      Test2     Anything

These are not duplicates because the values of Field3 are different even though there are repeated values in the earlier fields. Now when you only select certain columns the same number of records are being joined together but since the columns with the different information is not being displayed they look like duplicates.

select a.Id, a.Field1,  b.field2 from tableA a  join tableb b on a.id = b.id  a.Id a.Field1       b.field2   1    test           Test1      1    test           Test1  2    another test   Test2

This appears to be duplicates but it is not because of the multiple records in TableB.

You normally fix this by using aggregates and group by, by using distinct or by filtering in the where clause to remove duplicates. How you solve this depends on exactly what your business rule is and how your database is designed and what kind of data is in there.

123

answered Oct 19 '22 17:10

HLGEM

Related questions
                            
                                Converting java date to Sql timestamp
                            
                                How to select rows for a specific date, ignoring time in SQL Server
                            
                                JDBC ResultSet get columns with table alias
                            
                                How to query for null values in json field type postgresql?
                            
                                Does the order of fields in a WHERE clause affect performance in MySQL?
                            
                                Stop (long) running SQL query in PostgreSQL when session or requests no longer exist?
                            
                                How can I allow django admin to set a field to NULL?
                            
                                Is there a difference between a SELECT statement inside a transaction and one that is outside of it?
                            
                                Oracle timestamp data type
                            
                                MySql Inner Join with WHERE clause [closed]
                            
                                What does SQL Select symbol || mean?
                            
                                How do I update an android sqlite database column value to null using ContentValues?
                            
                                GROUP BY but get all values from other column
                            
                                Concatenate multiple rows in an array with SQL on PostgreSQL
                            
                                SQL ORDER BY date problem
                            
                                T-SQL Conditional Order By
                            
                                How do I view the Explain Plan in Oracle Sql developer?
                            
                                Determine what user created objects in SQL Server
                            
                                Does it make sense to use an index that will have a low cardinality?
                            
                                Difference between IN and ANY operators in SQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do multiple-table joins produce duplicate rows?

Tags:

sql

join

sirdank

People also ask

1 Answers

HLGEM

Recent Activity

Donate For Us