Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL implementation of set operators

Hi all and thanks in advance for your time.

  • The ANSI SQL standard defines the set operators UNION, EXCEPT, and INTERSECT.
  • Each of these has 2 variants - DISTINCT and ALL.
  • In addition, each can be of type SET or MULTISET.

AFAIK:

  • Only PostgreSQL supports the ALL variant for EXCEPT and INTERSECT.
  • Only Oracle supports MULTISETs.

UPDATE - Schwern informed me in a comment that MariaDB also supports ALL for set operators

For this question I'll use an EXCEPT example, although the same is true for INTERSECT.

The ANSI SQL (ISO/IEC CD 9075-2) standard defines the following:

Page 454 Section 7.16 'query expression' - Item 16: If a set operator is specified in a 'query term' or a 'query expression body', then: a) Let T1, T2, and TR be respectively the first operand, the second operand, and the result of the 'query term' or 'query expression body'. b) Let TN1 and TN2 be the effective names for T1 and T2, respectively. c) If the set operator is UNION DISTINCT, EXCEPT ALL, EXCEPT DISTINCT, INTERSECT ALL, or INTERSECT DISTINCT, then each column of T1 and T2 is an operand of a grouping operation...

Page 54 Section 4.10 Collection Types - Item 6.2: MULTISET EXCEPT is an operator that computes the multiset difference of two multisets. There are two variants, ALL and DISTINCT. The variant specified by ALL places in the result a number of instances of a value, equal to the number of instances of the value in the first operand minus the number of instances of the value in the second operand. The variant specified by DISTINCT removes duplicates from the result.

With that in mind, consider the following script (PostgreSQL 12):

CREATE TABLE T1 (C1 INT);
CREATE TABLE T2 (C1 INT);

INSERT INTO T1 VALUES (1), (2), (2), (3), (3), (3);
INSERT INTO T2 VALUES (3);

SELECT  * FROM T1
EXCEPT DISTINCT
SELECT * FROM T2;

SELECT * FROM T1
EXCEPT ALL
SELECT * FROM T2;

DROP TABLE T1, T2;

The result for the EXCEPT DISTINCT (default) is as expected: 2 rows with 1, 2.

The result for the EXCEPT ALL is not what I expect. It returns 5 rows with 3, 3, 2, 2, 1.

The PostgreSQL documentation confirms this behavior, but based on the ANSI definition I would expect to get back only 3 rows with 2, 2, 1.

The first quote suggests that the DISTINCT applies a grouping operation to the rows that qualify the operation. IMHO in this example 3 should not qualify the EXCEPT, and therefore should not be returned at all.

The behavior of subtracting (m-n) elements seems to be inline with a MULTISET operation as defined in the second quote above. However, PostgreSQL documentation state it does not support even basic multisets, and of course those require that we explicit specify MULTISET EXCEPT ALL

What am I missing here?

Based on the comments and replies I received I would like to clarify that I understand what EXCEPT ALL does here. PostgreSQL documentation is clear on this. My question is if this is the correct behavior for (SET) EXCEPT ALL. My understanding is that this is what MULTISET EXCEPT ALL is for.

Thank you in advance for your thoughts.

Be safe and healthy!

like image 697
SQLRaptor Avatar asked Nov 15 '22 08:11

SQLRaptor


1 Answers

After some more research and deciphering the standard, I believe that my original assumption was wrong, and that PostgreSQL behavior is correct.

The quote about the grouping refers to both DISTINCT and ALL. What I should have looked at is P462-463 where the standard explains:

Case:

a) If no set operator is specified, then T is the result of the specified .

b) Otherwise:

i) Let D be the degree of T.

ii) For each i, 1 (one) ≤ i ≤ D:

  1. 14 Let DTCi be the declared type of the i-th column of T.

  2. Let TCN1i be the effective name for the i-th column of T1.

  3. Let TCN2i be the effective name for the i-th column of T2.

  4. Let ET1 be a of the form SELECT CAST(TCN11 AS DTC1), CAST(TCN12 AS DTC2), ..., CAST(TCN1D AS DTCD) FROM TN1

  5. Let ET2 be a of the form SELECT CAST(TCN21 AS DTC1), CAST(TCN22 AS DTC2), ..., CAST(TCN2D AS DTCD) FROM TN2

iii) T contains the following rows:

  1. Let R be a row that is a duplicate of some row in ET1 or of some row in ET2 or both. Let m be the number of duplicates of R in ET1 and let n be the number of duplicates of R in ET2, where m ≥ 0 and n ≥ 0.

  2. If DISTINCT is specified or implicit, then

Case:

A) If UNION is specified, then

Case:

I) If m > 0 or n > 0, then T contains exactly one duplicate of R.

II) Otherwise, T contains no duplicate of R.

B) If EXCEPT is specified, then

Case:

I) If m > 0 and n = 0, then T contains exactly one duplicate of R.

II) Otherwise, T contains no duplicate of R.

C) If INTERSECT is specified, then

Case:

I) If m > 0 and n > 0, then T contains exactly one duplicate of R.

II) Otherwise, T contains no duplicates of R.

3) If ALL is specified, then

Case:

A) If UNION is specified, then the number of duplicates of R that T contains is (m + n).

B) If EXCEPT is specified, then the number of duplicates of R that T contains is the maximum of (m – n) and 0 (zero).

C) If INTERSECT is specified, then the number of duplicates of R that T contains is the minimum of m and n.

like image 149
SQLRaptor Avatar answered Dec 26 '22 19:12

SQLRaptor