Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FULL OUTER JOIN with filtered data

I seem to have run into a bit of a dilemma on how to best solve this requirement. I realize the issue is very closely related to the following other questions:

  • Problem with Full Outer Join not working as expected
  • What is the difference when adding a filter criteria to an outer join instead of a where clause?
  • Oracle outer join not working as expected
  • and probably many more...

The added question is that I'd wonder what the general opinion is on how to work around the problem.

IF OBJECT_ID('tempdb..#A') IS NOT NULL DROP TABLE #A
IF OBJECT_ID('tempdb..#B') IS NOT NULL DROP TABLE #B

GO
CREATE TABLE #A (key1   int NOT NULL PRIMARY KEY,
                 value1 int NOT NULL,
                 value2 int NOT NULL,
                 is_even AS (CASE WHEN key1 % 2 = 0 THEN 1 ELSE 0 END))

CREATE TABLE #B (key1   int NOT NULL PRIMARY KEY,
                 value1 int NOT NULL,
                 value2 int NOT NULL,
                 is_even AS (CASE WHEN key1 % 2 = 0 THEN 1 ELSE 0 END))

GO
-- dummy data
INSERT #A (key1, value1, value2)
SELECT TOP 10 key1   = ROW_NUMBER() OVER (ORDER BY x1.object_id),
              value1 = ROW_NUMBER() OVER (ORDER BY x1.object_id) % 7,
              value2 = ROW_NUMBER() OVER (ORDER BY x1.object_id) % 5
  FROM master.sys.objects x1, master.sys.objects x2, master.sys.objects x3

INSERT #B (key1, value1, value2)
SELECT key1, value1, value2
  FROM #A

GO
-- create holes but keep SOME overlap
DELETE #A WHERE value1 > value2 -- removes 3 records
DELETE #B WHERE value1 < value2 -- removes 3 records 

GO
-- show effect on tables
--SELECT * FROM #A ORDER BY key1
--SELECT * FROM #B ORDER BY key1

GO
-- create complete overview
SELECT key1 = ISNULL(a.key1, b.key1),
       value1a = a.value1, value2a = a.value2,
       value1b = b.value1, value2b = b.value2
  FROM #A a
  FULL OUTER JOIN #B b
               ON b.key1 = a.key1
 ORDER BY 1

GO
-- what if we only want the even records
-- THIS DOES NOT WORK !
SELECT key1 = ISNULL(a.key1, b.key1),
       value1a = a.value1, value2a = a.value2,
       value1b = b.value1, value2b = b.value2
  FROM #A a
  FULL OUTER JOIN #B b
               ON b.key1 = a.key1
              AND b.is_even = 1
 WHERE a.is_even = 1 
 ORDER BY 1

I know why it doesn't work; I just wonder what would be the most clear approach to make it work and remain readable for other people. Bonus points if it also works on systems other than MSSQL.

"My" solutions so far are:

By catching the NULL due to the OUTER effect:

SELECT key1 = ISNULL(a.key1, b.key1),
       value1a = a.value1, value2a = a.value2,
       value1b = b.value1, value2b = b.value2
  FROM #A a
  FULL OUTER JOIN #B b
               ON b.key1 = a.key1

 WHERE ISNULL(a.is_even, b.is_even) = 1
 ORDER BY 1

By means of a CTE

 ;WITH a (key1, value1, value2)
    AS (SELECT key1, value1, value2 
          FROM #A
         WHERE is_even = 1),
       b (key1, value1, value2)
    AS (SELECT key1, value1, value2 
          FROM #B
         WHERE is_even = 1)

 SELECT key1 = ISNULL(a.key1, b.key1),
       value1a = a.value1, value2a = a.value2,
       value1b = b.value1, value2b = b.value2
  FROM a
  FULL OUTER JOIN b
               ON b.key1 = a.key1
 ORDER BY 1

By means of subqueries

SELECT key1 = ISNULL(a.key1, b.key1),
       value1a = a.value1, value2a = a.value2,
       value1b = b.value1, value2b = b.value2
  FROM  (SELECT key1, value1, value2 
           FROM #A
          WHERE is_even = 1) a
  FULL OUTER JOIN (SELECT key1, value1, value2 
                     FROM #B
                    WHERE is_even = 1) b
               ON b.key1 = a.key1
 ORDER BY 1

Although I prefer the first solution, the CTE and/or subquery solutions look more obvious even though they add a LOT of fluff to the code. (And I don't like CTE's very much =)

Any opinions ? Other solutions ? Remarks (e.g. concerning performance on 'real' data)

like image 595
deroby Avatar asked Oct 03 '22 07:10

deroby


1 Answers

Your two approaches "With CTE" and "with subqueries" are exactly the same thing, it is just personal preference as to which you would use.

All 3 queries have the same estimated cost and the same I/O:

enter image description here

Table '#B'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#A'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

However the first has an additional step Filter because the subquery/CTE approach is able to apply the predicate is_even = 1 at the same time as the clustered index scan.

enter image description here

So I would go for either the subquery approach or the CTE approach depending on which you prefer visually. Don't be fooled into thinking less is always less when it comes to SQL, it can be more efficient to write more verbose queries.

like image 195
GarethD Avatar answered Oct 09 '22 12:10

GarethD