Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(Bitwise) Supersets and Subsets in MySQL

Are the following queries effective in MySQL:

SELECT * FROM table WHERE field & number = number; 
# to find values with superset of number's bits

SELECT * FROM table WHERE field | number = number; 
# to find values with subset of number's bits

...if an index for the field has been created?

If not, is there a way to make it run faster?

like image 321
Meisner Avatar asked Sep 21 '09 22:09

Meisner


2 Answers

Update:

See this entry in my blog for performance details:

  • Bitwise operations and indexes

SELECT * FROM table WHERE field & number = number

SELECT * FROM table WHERE field | number = number

This index can be effective in two ways:

  1. To avoid early table scans (since the value to compare is contained in the index itself)
    • To limit the range of values examined.

Neither condition in the queries above is sargable, this is the index will not be used for the range scan (with the conditions as they are now).

However, point 1 still holds, and the index can be useful.

If your table contains, say, 100 bytes per row in average, and 1,000,000 records, then the table scan will need to scan 100 Mb of data.

If you have an index (with a 4-byte key, 6-byte row pointer and some internal overhead), the query will need to scan only 10 Mb of data plus additional data from the table if the filter succeeds.

  • The table scan is more efficient if your condition is not selective (you have high probablility to match the condition).
  • The index scan is more efficient if your condition is selective (you have low probablility to match the condition).

Both these queries will require scanning the whole index.

But by rewriting the AND query you can benefit from the ranging on the index too.

This condition:

field & number = number

can only match the fields if the highest bits of number set are set in the field too.

And you should just provide this extra condition to the query:

SELECT  *
FROM    table
WHERE   field & number = number
        AND field >= 0xFFFFFFFF & ~((2 << FLOOR(LOG(2, 0xFFFFFFFF & ~number))) - 1)

This will use the range for coarse filtering and the condition for fine filtering.

The more bits for number are unset at the end, the better.

like image 182
Quassnoi Avatar answered Oct 19 '22 22:10

Quassnoi


I doubt the optimizer would figure that one...

Maybe you can call EXPLAIN on these queries and confirm my pessimistic guess. (remembering of course that much of query plan decisions are based on the specific instance of a given database, i.e. variable amounts of data and/ore merely data with a different statistical profile may produce distinct plans).

Assuming that the table has a significant amount of rows, and that the "bitwised" criteria remain selective enough) a possible optimization is achieved when avoiding a bitwise operation on every single row, by rewriting the query with an IN construct (or with a JOIN)

Something like that (conceptual, i.e. not tested)

CREATE TEMPORARY TABLE tblFieldValues
  (Field INT);

INSERT INTO tblFieldValues
   SELECT DISTINCT Field
   FROM table;

-- SELECT * FROM table WHERE field | number = number; 
-- now becomes
SELECT * 
FROM table t
WHERE field IN 
    (SELECT Field 
     FROM tblFieldValues 
     WHERE field | number = number); 

The full benefits of an approach like this need to be evaluated with different use cases (all of which with a sizeable number of rows in table, since otherwise the direct "WHERE field | number = number" approach is efficient enough), but I suspect this could be significantly faster. Further gains can be achieved if the "tblFieldValues" doesn't need to be recreated each time. Efficient creation of this table of course implies an index on Field in the original table.

like image 43
mjv Avatar answered Oct 19 '22 22:10

mjv