Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there no "first greater/less than [or equal to]" comparison operator in SQL?

Tags:

sql

mysql

I am considering the proposal of 4 new comparison operators in SQL. These are similar to the >, <, >= and <= operators, but are true only if the value of each operand is, among all values satisfying the inequality, the one closest to the value of the other operand. Since one value is almost at the other value, I have come to the conclusion (after realizing that there is no first keyword, and after discarding the unique keyword) that a good choice would be to define these 4 new operators:

  • a @> b : true if a > b and no a' < a satisfies a' > b and no b' > b satisfies a > b'
  • a @< b : true if b @> a
  • a @>= b : true if a ≥ b and no a' < a satisfies a' ≥ b and no b' > b satisfies a ≥ b'
  • a @<= b : true if b @>= a

The question is: is there some good reason why operators like these don't exist already?


(2014-03-20) I reformulate the question, because the above formulation apparently isn't clear enough:

Is there a reason why operators like these should not exist?


The following examples are intended as a starting point for finding out where the problems with the @... operators could lie. I will use 3 MySQL tables:

create table ta (id int auto_increment, ca char, primary key(id), unique index(ca));
create table tb (id int auto_increment, cb char, primary key(id), index(cb));
create table tc (id int auto_increment, cc char, primary key(id));
insert into ta (ca) values ('A'),('E'),('I'),('O'),('U');
insert into tb (cb) values ('C'),('D'),('E'),('F'),('F'),('M'),('N'),('O'),('Z');
insert into tc (cc) values ('C'),('D'),('E'),('F'),('F'),('M'),('N'),('O'),('Z');

Example #1

When columns have unique values, the effect of the @... operators can be obtained by limiting the output of queries or subqueries to 1 row, although with a somewhat more clumsy syntax:

?> select * from ta where ca @> 'B'; -- currently not valid, equivalent to:
!> select * from ta where ca > 'B' order by ca limit 1;
+----+------+
| id | ca   |
+----+------+
|  2 | E    |
+----+------+

(limit 1 is specific to MySQL, MariaDB, PostgreSQL, etc., other RDBMSs have select top 1, where rownum = 1, etc.)

In table ta we have a unique index on column ca. This index can be exploited to get to the selected value at the same speed as for ca = 'E'. The optimizer may realize this, but if it doesn't, data structures may be set up for an unneeded scan starting at the selected value (MySQL's explain says that this is a range type query).

Example #2

When columns have non-unique values, limiting output rows is useless, and the syntax becomes even more clumsy:

?> select * from tb where cb @> 'E'; -- currently not valid, equivalent to:
!> select * from tb where cb = (select min(cb) from tb where cb > 'E');
+----+------+
| id | cb   |
+----+------+
|  4 | F    |
|  5 | F    |
+----+------+

Luckily, if I correctly read the output of explain, MySQL is smart enough to optimize the subquery away, but if it weren't, the index would be used twice instead of once.

In the case of the table tc, which has no index on column cc, MySQL makes two table scans. This is understandable, since a single table scan would mean using an unknown amount of storage for the temporary result.

Example #3

Suppose you need all pairs consisting of a value and its successor value:

?> select t1.ca as c1, t2.ca as c2
   from ta t1
   join ta t2 on t1.ca @< t2.ca; -- currently not valid, equivalent to:
!> select t1.ca as c1, t2.ca as c2
   from ta t1
   join ta t2 on t2.ca = (select min(ca) from ta where ca > t1.ca);
+------+------+
| c1   | c2   |
+------+------+
| A    | E    |
| E    | I    |
| I    | O    |
| O    | U    |
+------+------+

If I read the output of explain correctly, the MySQL optimizer is not able to do without the correlated subquery, while we humans would know better. Maybe with the help of special handling of the @... operators wired in, the optimizer would do a single scan?

Example #4

This is similar, but across two tables, one of which has a non-unique index:

?> select * from ta join tb on ca @< cb; -- currently not valid, equivalent to:
!> select * from ta join tb on cb = (select min(cb) from tb where cb > ca);
+----+------+----+------+
| id | ca   | id | cb   |
+----+------+----+------+
|  1 | A    |  1 | C    |
|  2 | E    |  4 | F    |
|  2 | E    |  5 | F    |
|  3 | I    |  6 | M    |
|  4 | O    |  9 | Z    |
|  5 | U    |  9 | Z    |
+----+------+----+------+

Here too, the MySQL optimizer does not optimize away the subquery, although (maybe with a hint by the @<) it could.

Example #5

(Added on 2014-03-20.) The @... operators seem to make sense wherever their @-less counterparts do. Here is a contrived example of an expression in a where condition:

?> select * from ta join tb
   where round((ascii(ca)+ascii(cb))/2) @> ascii('E');
   -- currently not valid, equivalent to:
!> select * from ta join tb
   where round((ascii(ca)+ascii(cb))/2) = (
      select min(round((ascii(ca)+ascii(cb))/2)) from ta, tb
      where      round((ascii(ca)+ascii(cb))/2) > ascii('E')
   );
+----+------+----+------+
| id | ca   | id | cb   |
+----+------+----+------+
|  3 | I    |  1 | C    |
|  2 | E    |  4 | F    |
|  2 | E    |  5 | F    |
+----+------+----+------+

Example #6

... and this is another example, this time of a select expression:

?> select *, cb @< ca
   from tb, ta; -- currently not valid, equivalent to:
!> select *, ifnull(cb = (select max(cb) from tb where cb < ca), 0) as 'cb @< ca'
   from tb, ta;
+----+------+----+------+----------+
| id | cb   | id | ca   | cb @< ca |
+----+------+----+------+----------+
|  1 | C    |  1 | A    |        0 |
|  1 | C    |  2 | E    |        0 |
|  1 | C    |  3 | I    |        0 |
|  1 | C    |  4 | O    |        0 |
|  1 | C    |  5 | U    |        0 |
|  2 | D    |  1 | A    |        0 |
|  2 | D    |  2 | E    |        1 |
| -- (omitting rows with cb @< ca equal to 0 from here on)
|  4 | F    |  3 | I    |        1 |
|  5 | F    |  3 | I    |        1 |
|  7 | N    |  4 | O    |        1 |
|  8 | O    |  5 | U    |        1 |

I am aware of the following caveats:

Caveat #1

The @... operators are "non-local", because they need the knowledge of all possible values of their operands. This seems not to be a problem in all conditions of the kinds shown in the above examples, but could be a problem in other places (although I haven't found an example yet that could not be solved by an additional subquery).

Caveat #2

The @... operators, unlike their @-less counterparts, are not transitive. They share this property with the <> operator, though.

Caveat #3

Fully exploiting the @... operators could mean introducing new index and table access types (as discussed in the examples).


Please note that this question is not meant as a starting point for a discussion. I am looking for the reasons why something like the @... operators is not in the standard nor in any SQL dialect known to me - I expect these reasons to be related to some problem with the definition and/or implementation of these operators that I have overlooked.

I know that one reason is "Occam's razor" (pluralitas non est ponenda sine necessitate), but, as I have tried to show above, pluralitas here also brings some advantages (conciseness and ease of optimization). I am looking for stronger reasons.


(2014-03-31) @>, @<, @>= and @<= could become |>, |<, |>= and |<= or similar (read: first greater/less [equal]), in order not to collide with established usages of the @ identifier prefix.

like image 675
Walter Tross Avatar asked Mar 16 '14 14:03

Walter Tross


1 Answers

I am confused as to whether this is an appropriate forum for this question. But, the reason these operators do not exist is that they are not particularly useful and other ANSI SQL functionality takes their place.

First, the comparison operators in the on clause are, in every database I know, available in the where and case clauses. It is unclear how these operators would be used in these contexts.

Second, the operators don't specify what to do in the case of ties. Return all rows? But that would return multiple rows when the user of such an operator would expect only one row.

Third, ANSI standard functionality, such as row_number() can generate equivalent results. Although it might not be as optimal for this particular problem, it is more general. And standard.

By the way, Postgres has a nice capability with distinct on(), which is often more efficient than the analytic function equivalent.

I have secretly wanted a new join type, lookup join, that would fail if more than one record matched. However, I'm not so sure that the entire language should be changed for this purpose.

like image 181
Gordon Linoff Avatar answered Oct 02 '22 16:10

Gordon Linoff