Are the following queries effective in MySQL:
SELECT * FROM table WHERE field & number = number;
# to find values with superset of number's bits
SELECT * FROM table WHERE field | number = number;
# to find values with subset of number's bits
...if an index for the field has been created?
If not, is there a way to make it run faster?
Update:
See this entry in my blog for performance details:
SELECT * FROM table WHERE field & number = number
SELECT * FROM table WHERE field | number = number
This index can be effective in two ways:
Neither condition in the queries above is sargable, this is the index will not be used for the range scan (with the conditions as they are now).
However, point 1
still holds, and the index can be useful.
If your table contains, say, 100
bytes per row in average, and 1,000,000
records, then the table scan will need to scan 100 Mb
of data.
If you have an index (with a 4
-byte key, 6
-byte row pointer and some internal overhead), the query will need to scan only 10 Mb
of data plus additional data from the table if the filter succeeds.
Both these queries will require scanning the whole index.
But by rewriting the AND
query you can benefit from the ranging on the index too.
This condition:
field & number = number
can only match the fields if the highest bits of number
set are set in the field
too.
And you should just provide this extra condition to the query:
SELECT *
FROM table
WHERE field & number = number
AND field >= 0xFFFFFFFF & ~((2 << FLOOR(LOG(2, 0xFFFFFFFF & ~number))) - 1)
This will use the range for coarse filtering and the condition for fine filtering.
The more bits for number
are unset at the end, the better.
I doubt the optimizer would figure that one...
Maybe you can call EXPLAIN on these queries and confirm my pessimistic guess. (remembering of course that much of query plan decisions are based on the specific instance of a given database, i.e. variable amounts of data and/ore merely data with a different statistical profile may produce distinct plans).
Assuming that the table has a significant amount of rows, and that the "bitwised" criteria remain selective enough) a possible optimization is achieved when avoiding a bitwise operation on every single row, by rewriting the query with an IN construct (or with a JOIN)
Something like that (conceptual, i.e. not tested)
CREATE TEMPORARY TABLE tblFieldValues
(Field INT);
INSERT INTO tblFieldValues
SELECT DISTINCT Field
FROM table;
-- SELECT * FROM table WHERE field | number = number;
-- now becomes
SELECT *
FROM table t
WHERE field IN
(SELECT Field
FROM tblFieldValues
WHERE field | number = number);
The full benefits of an approach like this need to be evaluated with different use cases (all of which with a sizeable number of rows in table, since otherwise the direct "WHERE field | number = number" approach is efficient enough), but I suspect this could be significantly faster. Further gains can be achieved if the "tblFieldValues" doesn't need to be recreated each time. Efficient creation of this table of course implies an index on Field in the original table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With