Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is 'length() IS NULL' equivalent and faster than 'IS NULL' for BLOBs?

Tags:

sql

sqlite

I am having a ~90 MB SQLite database on a SSD consisting mostly on message attachments including a BLOB column content, that stores the binary attachment data.

Now I found that the following query

SELECT message_id FROM attachments WHERE length(content) IS NULL;

is 500x faster (0.5ms vs. 250ms) than the original

SELECT message_id FROM attachments WHERE content IS NULL;

Is it true, that both queries are equivalent?

Additional info

  1. No indexes are involved apart from the autoindex.
  2. It's not caching. The result can be reproducued unlimited times in any order from any number of SQLite processes.
like image 705
Simon Warta Avatar asked Nov 07 '14 12:11

Simon Warta


2 Answers

In SQLite, the length and type of each column value are stored at the beginning of the row. This allows the length() and typeof() functions to be optimized to avoid loading the actual value.

The IS NULL operator has no such optimization (although it would be possible to implement it).

like image 185
CL. Avatar answered Oct 07 '22 02:10

CL.


I made a script to benchmark both functions. length(x) IS NULL is faster unless you have mostly NULL values.

Results:

  • 50% alternating between random data and null:
    • IS NULL: 11.343180236999842
    • length(x) IS NULL: 7.824154090999855
  • Entirely blobs, no nulls:
    • IS NULL: 15.019244787999924
    • length(x) IS NULL: 7.527420233999919
  • Entirely nulls, no blobs:
    • IS NULL: 6.184766045999822
    • length(x) IS NULL: 6.448342310000044

Test script:

import sqlite3
import timeit

conn = sqlite3.connect("test.db")
c = conn.cursor()

c.execute("DROP TABLE IF EXISTS test")
c.execute("CREATE TABLE test (data BLOB)")

for i in range(10000):
    # Modify this to change data
    if i % 2 == 0:
        c.execute("INSERT INTO test(data) VALUES (randomblob(1024))")
    else:
        c.execute("INSERT INTO test(data) VALUES (NULL)")

def timeit_isnull():
    c.execute("SELECT data IS NULL AS dataisnull FROM test")
    c.fetchall()

def timeit_lenisnull():
    c.execute("SELECT length(data) IS NULL AS dataisnull FROM test")
    c.fetchall()

print(timeit.timeit(timeit_isnull, number=1000))
print(timeit.timeit(timeit_lenisnull, number=1000))
like image 24
Colonel Thirty Two Avatar answered Oct 07 '22 01:10

Colonel Thirty Two