Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird "all" in"where"

Tags:

sql

mysql

The problem exists in MySQL's newest version, so I even doubt that that may be a bug.

Here are two tables:

t1(id int), values (10),(2)
t2(id int), values (0),(null),(1)

Execute:

select id from t1 where id > all (select id from t2);

Return result set:

+------+
| id   |
+------+
|   10 |
|    2 |
+------+

According to my knowledge and the page http://dev.mysql.com/doc/refman/5.5/en/all-subqueries.html

The statement should return empty result! Because each judgement in "where" leads to null, like this:

select id > all (select id from t2)  as c1 from t1;

returns:

+------+
| c1   |
+------+
| NULL |
| NULL |
+------+

and actually select id from t1 where null; returns nothing!

Finally, I tried this:

explain extended select id from t1 where id > all (select id from t2);
show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                             |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`t1`.`id` AS `id` from `test`.`t1` where <not>((`test`.`t1`.`id` <= (select max(`test`.`t2`.`id`) from `test`.`t2`))) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

We can see that MySQL optimizes the original SQL to this one, which actually fits the result set.

but I not think the optimized SQL equals the original one .

Am I wrong?

like image 580
carl Avatar asked Jul 13 '12 00:07

carl


2 Answers

Update: Upon further analysis and unfolding of MySQL's > ALL odd implementation. This answer should be considered as MySQL-specific. So for further disclaimer, explanation on answer here regarding > ALL is not applicable to other RDBMSes (unless there are other RDBMSes that copied MySQL implementation). Internal translation from > ALL to a MAX construct, applies to MySQL only.

This:

select id from t1 where id > all (select id from t2); 

is semantically equivalent to:

select id from t1 where id > (select max(id) from t2); 

Since select max(id) from t2 returns 1, the second query materializes to this:

select id from t1 where id > 1

That's why it returns both 10 and 2 from table t1


One of the instances where NULL rules is being applied is when you use NOT IN, an example:

DDL:

create table t1(id int);

insert into t1 values (10),(2);


create table t2(id int); 

insert into t2 values (0),(null),(1);

Query:

select * from t1 where id not in (select id from t2);

-- above is evaluated same as the following query, so the rules about null applies,
-- hence the above and following query will not return any record.    

select * from t1 where id <> 0 and id <> null and id <> 1;



-- to eliminate null side-effect, do this:
select * from t1 where id not in (select id from t2 where id is not null);

-- which is equivalent to this:
select * from t1 where id <> 0 and id <> 1;

The last two queries returns 10 and 2, whereas the first two queries returns empty set

Live test: http://www.sqlfiddle.com/#!2/82865/1

Hope these examples erases your confusion with NULL rules.


Regarding

but I not think the optimized sql equals the original one .

Optimized sql being this:

select `test`.`t1`.`id` AS `id` from `test`.`t1` where <not>((`
test`.`t1`.`id` <= (select max(`test`.`t2`.`id`) from `test`.`t2`)))

That is really equivalent to your original query: select id from t1 where id > all (select id from t2);

The construct t1.field > all (select t2.field from t2) is just a syntactic sugar for:

t1.field > (select max(t2.field) from t2)

If you will apply DeMorgan theorem on the optimized SQL by MySql:

not (t1.id <= (select max(t2.id) from t2))

That is equivalent to:

t1.id > (select max(t2.id) from t2)

Which in turn is equivalent to the syntactic sugar ALL:

t1.id > ALL(select t2.id from t2)
like image 192
Michael Buen Avatar answered Nov 15 '22 04:11

Michael Buen


This is a bug in MySQL (Reported and verified here).

The fix will be available in 5.6.7 (next 5.6x version) as well as in the next major tree (5.7x)

It differs from the stated behaviour in the MySQL docs and that prescribed in the ANSI standard.

Moreover it is not even consistent in MySQL and you get different results when the sub query references a table compared to when the sub query contains (the same) literal values.

CREATE TABLE t2
  (
     id INT
  );

INSERT INTO t2
VALUES      (0),
            (NULL),
            (1);

/*Returns row with 10*/
SELECT *
FROM   (SELECT 10 AS id) T
WHERE  id > ALL (SELECT id
                 FROM   t2);

/*Returns no rows. Explain Plan says "Impossible Where"*/
SELECT *
FROM   (SELECT 10 AS id) T
WHERE  id > ALL (SELECT 0
                 UNION ALL
                 SELECT NULL
                 UNION ALL
                 SELECT 1); 

The second behaviour is correct per the spec. How 10 > ALL( (0),(null),(1) ) ought to be logically evaluated as follows

10 > 0 =  TRUE
10 > NULL =  UNKNOWN
10 > 1 =  TRUE

Under the rules of three valued logic

TRUE AND UNKNOWN AND TRUE = UNKNOWN

So this row should not be returned. See the ANSI specification which clearly states

The result of "R <comp op> <quantifier> T" is derived by the application of the implied <comparison predicate> "R <comp op> RT" to every row RT in T:

Therefore this is not a semantically valid optimisation when T is Nullable. The full section of the spec is reproduced below.

8.7

     Function

     Specify a quantified comparison.

     Format

     <quantified comparison predicate> ::=
          <row value constructor> <comp op> <quantifier> <table subquery>


     <quantifier> ::= <all> | <some>

     <all> ::= ALL

     <some> ::= SOME | ANY


     Syntax Rules

     1) The <row value constructor> shall be of the same degree as the
        result of the <table subquery>.

     2) The data types of the values of the <row value constructor>
        shall be respectively comparable to those of the columns of the
        <table subquery>.

     3) The collating sequence for each pair of respective values in
        the <quantified comparison predicate> is determined in the same
        manner as described in Subclause 8.2, "<comparison predicate>".

     Access Rules

        None.

     General Rules

     1) Let R be the result of the <row value constructor> and let T be
        the result of the <table subquery>.

     2) The result of "R <comp op> <quantifier> T" is derived by the
        application of the implied <comparison predicate> "R <comp op>
        RT" to every row RT in T:

        Case:

        a) If T is empty or if the implied <comparison predicate> is
          true for every row RT in T, then "R <comp op> <all> T" is
          true.

        b) If the implied <comparison predicate> is false for at least
          one row RT in T, then "R <comp op> <all> T" is false.

        c) If the implied <comparison predicate> is true for at least
          one row RT in T, then "R <comp op> <some> T" is true.

        d) If T is empty or if the implied <comparison predicate> is
          false for every row RT in T, then "R <comp op> <some> T" is
          false.

        e) If "R <comp op> <quantifier> T" is neither true nor false,
          then it is unknown.
like image 38
Martin Smith Avatar answered Nov 15 '22 05:11

Martin Smith