I am having a ~90 MB SQLite database on a SSD consisting mostly on message attachments including a BLOB column content, that stores the binary attachment data.
Now I found that the following query
SELECT message_id FROM attachments WHERE length(content) IS NULL;
is 500x faster (0.5ms vs. 250ms) than the original
SELECT message_id FROM attachments WHERE content IS NULL;
Is it true, that both queries are equivalent?
Additional info
In SQLite, the length and type of each column value are stored at the beginning of the row.
This allows the length()
and typeof()
functions to be optimized to avoid loading the actual value.
The IS NULL operator has no such optimization (although it would be possible to implement it).
I made a script to benchmark both functions. length(x) IS NULL
is faster unless you have mostly NULL
values.
IS NULL
: 11.343180236999842length(x) IS NULL
: 7.824154090999855IS NULL
: 15.019244787999924length(x) IS NULL
: 7.527420233999919IS NULL
: 6.184766045999822length(x) IS NULL
: 6.448342310000044import sqlite3
import timeit
conn = sqlite3.connect("test.db")
c = conn.cursor()
c.execute("DROP TABLE IF EXISTS test")
c.execute("CREATE TABLE test (data BLOB)")
for i in range(10000):
# Modify this to change data
if i % 2 == 0:
c.execute("INSERT INTO test(data) VALUES (randomblob(1024))")
else:
c.execute("INSERT INTO test(data) VALUES (NULL)")
def timeit_isnull():
c.execute("SELECT data IS NULL AS dataisnull FROM test")
c.fetchall()
def timeit_lenisnull():
c.execute("SELECT length(data) IS NULL AS dataisnull FROM test")
c.fetchall()
print(timeit.timeit(timeit_isnull, number=1000))
print(timeit.timeit(timeit_lenisnull, number=1000))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With