Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why have I got an increased row count after LEFT JOIN

Im trying to JOIN together various datasets to get a master table, though I managed to preserve the row count after 3 left joins, but after the next step, it seems to increase it. Any ideas why?

Query with 3 Joins

-------------------------------------------------------------------
--- STEP 4:    ----------------
-------------------------------------------------------------------
SELECT
DISTINCT        Table1.[Field1]
        ,   Table1.[Field2]
        ,   Table3.[Field3]
        ,   Table1.[Field4]
        ,   Table1.[Field5]
        ,   Table1.[Field6]
        ,   Table1.[Field7]
        ,   Table1.[Field8]
        ,   Table1.[Field9]
        ,   Table1.[Field10]
FROM        db1.dbo.raw_tbl_1 AS Table1
LEFT JOIN   db2.dbo.tbl_2 Table2
ON          Table1.Field7 = Table2.[Field13]
    LEFT JOIN   db2.dbo.tbl_3 Table3
    ON          CONVERT(INT,Table1.[Field2]) = Table3.Field14
        LEFT JOIN   db2.dbo.tbl_4Table4
        ON          Table2.Field17= Table4. Field15
WHERE       Table2. Field17 IS NOT NULL
-- 2682270 rows (Desired row count)

Query with 4 Joins (The one that increases row count)

-------------------------------------------------------------------
--- STEP 5:    ----
-------------------------------------------------------------------
SELECT
DISTINCT        Table1.[Field1]
        ,   Table1.[Field2]
        ,   Table3.[Field3]
        ,   Table1.[Field4]
        ,   Table1.[Field5]
        ,   Table1.[Field6]
        ,   Table1.[Field7]
        ,   Table5.[Field11]
        ,   Table6.[Field12]
        ,   Table1.[Field8]
        ,   Table1.[Field9]
        ,   Table1.[Field10]
FROM        db1.dbo.raw_tbl_1 AS Table1
LEFT JOIN   db2.dbo.tbl_2 Table2
ON          Table1.Field7 = Table2.[Field13]
    LEFT JOIN   db2.dbo.tbl_3 Table3
    ON          CONVERT(INT,Table1.[Field2]) = Table3. Field14
        LEFT JOIN   db2.dbo.tbl_4 Table4
        ON          Table2. Field17= Table4. Field15
            LEFT JOIN   db2.dbo.tbl_5 Table5
            ON          Table4. Field18= Table5. Field16
                LEFT JOIN   db2.dbo.tbl_6 Table6
                ON          Table5.[Field11] = CONVERT(INT,Table6.[Table6])
WHERE       Table2.Field17 IS NOT NULL
like image 733
VishalJ Avatar asked Nov 21 '13 09:11

VishalJ


People also ask

Why inner join increases number of rows?

Inner Join can for sure return more records than the records of the table. Inner join returns the results based on the condition specified in the JOIN condition. If there are more rows that satisfy the condition (as seen in query 2), it will return you more results.

Does LEFT join return duplicate rows?

Again, if we perform a left outer join where date = date, each row from Table 5 will join on to every matching row from Table 4. However, in this case, the join will result in 4 rows of duplicate dates in the joined DataSet (see Table 6).

What happens Left join multiple matches?

A left join returns all rows from x, and all columns from x and y. If there are multiple matches between x and y, all match combinations are returned.

Does LEFT join remove rows?

Introduction to SQL LEFT JOIN clause The inner join clause eliminates the rows that do not match with a row of the other table. The left join, however, returns all rows from the left table whether or not there is a matching row in the right table.


1 Answers

If one of the tables in the LEFT JOIN has more than one corresponding value, it will create a new row. If you don't want this behaviour, you need to use an aggregating function and GROUP BY.

More specifically, if you make a query using only the last tables you joined (the ones that cause the new rows), you'll be able to find the duplicate rows and decide how you want to handle that.


Since you mention that the last join is causing the problem, it means that Table6 is returning more rows than you expect it to. You'll have to do something like:

    SELECT Table5.Field11, COUNT(Table6.Table6) AS row_count
    FROM Table5
    LEFT JOIN  db2.dbo.tbl_6 Table6
        ON Table5.[Field11] = CONVERT(INT,Table6.[Table6])
    GROUP BY Field11
    HAVING row_count > 1

(the HAVING clause assume you're expecting a 1 to 1 correspondence between the tables. If not, leave it out). You'll have to manually scan for where Table6 returns more rows than you'd expected, then either modify your query or delete data accordingly.

like image 100
Stephen O'Flynn Avatar answered Sep 23 '22 14:09

Stephen O'Flynn