Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast search on a blob starting bytes in SQLite

Tags:

sqlite

blob

Is there a way to index blob fields and have the index used for beginning of blob searches?

Currently I have hashes stored as hexadecimal in text fields. These hashes in hexadecimal form are 32 characters long, and form the bulk of the data in the database.

Problem is, they are often searched by their starting bytes, as in

select * from mytable where hash like '00a1b2%'

I would like to store them as blobs, as this saves about 30% of the database size. However while

select * from mytable where hex(hash) like '00a1b2%'

works, it's also much slower and does not seem to use the index.

Searching for exact blob matches does use the index, so the index is working.

Is there a way to perform a search on a blob start (with binary/memcmp "collation") that would use the index?

I also tried substr(), it's apparently faster than hex() but still not indexed

select * from mytable where substr(hash, 1, 6) = x'00a1b2'
like image 637
Eric Grange Avatar asked Jun 03 '14 09:06

Eric Grange


1 Answers

To be able to use an index for LIKE, the table column must have TEXT affinity, and the index must be case insensitive:

CREATE TABLE mytable(... hash TEXT, ...);
CREATE INDEX hash_index ON mytable(hash COLLATE NOCASE);

Functions like hex or substr prevent usage of indexes.


Blobs can be indexed and compared like other types. This allows you to express a prefix search with two comparisons:

SELECT * FROM mytable WHERE hash >= x'00a1b2' AND hash < x'00a1b3'
like image 173
CL. Avatar answered Nov 10 '22 15:11

CL.