Im trying to JOIN together various datasets to get a master table, though I managed to preserve the row count after 3 left joins, but after the next step, it seems to increase it. Any ideas why?
Query with 3 Joins
-------------------------------------------------------------------
--- STEP 4: ----------------
-------------------------------------------------------------------
SELECT
DISTINCT Table1.[Field1]
, Table1.[Field2]
, Table3.[Field3]
, Table1.[Field4]
, Table1.[Field5]
, Table1.[Field6]
, Table1.[Field7]
, Table1.[Field8]
, Table1.[Field9]
, Table1.[Field10]
FROM db1.dbo.raw_tbl_1 AS Table1
LEFT JOIN db2.dbo.tbl_2 Table2
ON Table1.Field7 = Table2.[Field13]
LEFT JOIN db2.dbo.tbl_3 Table3
ON CONVERT(INT,Table1.[Field2]) = Table3.Field14
LEFT JOIN db2.dbo.tbl_4Table4
ON Table2.Field17= Table4. Field15
WHERE Table2. Field17 IS NOT NULL
-- 2682270 rows (Desired row count)
Query with 4 Joins (The one that increases row count)
-------------------------------------------------------------------
--- STEP 5: ----
-------------------------------------------------------------------
SELECT
DISTINCT Table1.[Field1]
, Table1.[Field2]
, Table3.[Field3]
, Table1.[Field4]
, Table1.[Field5]
, Table1.[Field6]
, Table1.[Field7]
, Table5.[Field11]
, Table6.[Field12]
, Table1.[Field8]
, Table1.[Field9]
, Table1.[Field10]
FROM db1.dbo.raw_tbl_1 AS Table1
LEFT JOIN db2.dbo.tbl_2 Table2
ON Table1.Field7 = Table2.[Field13]
LEFT JOIN db2.dbo.tbl_3 Table3
ON CONVERT(INT,Table1.[Field2]) = Table3. Field14
LEFT JOIN db2.dbo.tbl_4 Table4
ON Table2. Field17= Table4. Field15
LEFT JOIN db2.dbo.tbl_5 Table5
ON Table4. Field18= Table5. Field16
LEFT JOIN db2.dbo.tbl_6 Table6
ON Table5.[Field11] = CONVERT(INT,Table6.[Table6])
WHERE Table2.Field17 IS NOT NULL
Inner Join can for sure return more records than the records of the table. Inner join returns the results based on the condition specified in the JOIN condition. If there are more rows that satisfy the condition (as seen in query 2), it will return you more results.
Again, if we perform a left outer join where date = date, each row from Table 5 will join on to every matching row from Table 4. However, in this case, the join will result in 4 rows of duplicate dates in the joined DataSet (see Table 6).
A left join returns all rows from x, and all columns from x and y. If there are multiple matches between x and y, all match combinations are returned.
Introduction to SQL LEFT JOIN clause The inner join clause eliminates the rows that do not match with a row of the other table. The left join, however, returns all rows from the left table whether or not there is a matching row in the right table.
If one of the tables in the LEFT JOIN has more than one corresponding value, it will create a new row. If you don't want this behaviour, you need to use an aggregating function and GROUP BY
.
More specifically, if you make a query using only the last tables you joined (the ones that cause the new rows), you'll be able to find the duplicate rows and decide how you want to handle that.
Since you mention that the last join is causing the problem, it means that Table6 is returning more rows than you expect it to. You'll have to do something like:
SELECT Table5.Field11, COUNT(Table6.Table6) AS row_count
FROM Table5
LEFT JOIN db2.dbo.tbl_6 Table6
ON Table5.[Field11] = CONVERT(INT,Table6.[Table6])
GROUP BY Field11
HAVING row_count > 1
(the HAVING
clause assume you're expecting a 1 to 1 correspondence between the tables. If not, leave it out). You'll have to manually scan for where Table6 returns more rows than you'd expected, then either modify your query or delete data accordingly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With