Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent Oracle minus statement from removing duplicates

Tags:

sql

oracle

Given these two tables:

CREATE TABLE TEST1 (TEST  VARCHAR2(1 BYTE))
CREATE TABLE TEST2 (TEST  VARCHAR2(1 BYTE))

Where TEST1 has two rows both with the value 'A' and TEST2 has one row with the value 'B'.

When I run this command:

SELECT TEST FROM TEST1
MINUS
SELECT TEST FROM TEST2

I get the output:

Test
-----
A

It appears that MINUS removes the duplicates (as there are two 'A' rows in TEST1).

How can you get MINUS query to include duplicate values (return two 'A' rows)?

like image 311
Marcus Leon Avatar asked May 13 '09 17:05

Marcus Leon


3 Answers

Another option:

SELECT TEST, row_number() OVER (PARTITION BY TEST ORDER BY TEST) FROM TEST1
MINUS
SELECT TEST, row_number() OVER (PARTITION BY TEST ORDER BY TEST) FROM TEST2

This would be MINUS with each duplicate treated as a distinct entry. Note in the example below, if TEST1 has two 'C' values and TEST2 has only one, you get one in the output.

dev> select * from test1;

T
-
A
A
B
C
C

dev> select * from test2;

T
-
B
C

dev>     SELECT TEST, row_number() OVER (PARTITION BY TEST ORDER BY TEST) FROM TEST1
  2      MINUS
  3      SELECT TEST, row_number() OVER (PARTITION BY TEST ORDER BY TEST) FROM TEST2
  4  /

T ROW_NUMBER()OVER(PARTITIONBYTESTORDERBYTEST)
- --------------------------------------------
A                                            1
A                                            2
C                                            2
like image 94
Dave Costa Avatar answered Nov 19 '22 02:11

Dave Costa


SELECT field1 FROM table1 WHERE field1 NOT IN (SELECT field2 FROM table2)

Will work so long as field2 can't contain NULLs.

like image 39
cagcowboy Avatar answered Nov 19 '22 03:11

cagcowboy


The "NOT IN" answers are all correct. An alternative, which might be easier for some scenarios, is the "NOT EXISTS" operator:

SELECT TEST FROM TEST1
WHERE NOT EXISTS
(SELECT null FROM TEST2 WHERE TEST2.TEST = TEST1.TEST);

(Note: the "null" in the select clause is meaningless here)

I personally use both methods, but I like the NOT EXISTS often because it is more flexible - it doesn't require the comparison to be on an equality condition, for example.

Recent versions of the optimiser will often convert a NOT IN to a NOT EXISTS, or vice versa; however, if you're on an older version (e.g. 8i or even 9i I think) you may see performance benefits from switching between these two methods.

like image 35
Jeffrey Kemp Avatar answered Nov 19 '22 01:11

Jeffrey Kemp