Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is 'IS NULL' 100x slower than 'length() = 0' on a blob column?

Tags:

sql

sqlite

I am having a ~90 MB database consisting mostly on message attachments including a BLOB column content, that stores the binary attachment data.

I assume it is not wise to create an index over a BLOBs, so no indexes involved apart from the autoindex.

For getting empty attachments, I compared the following querys:

SELECT message_id FROM attachments WHERE content IS NULL;

and

SELECT message_id FROM attachments WHERE length(content) = 0;

which result in the same rows in my usecase.

Why does the first one take 250ms and the second one only 1-2ms (both on a SSD)? What is the reason behind that? Is there a hidden length index or something? Any insight appreciated.

Additional info

  1. The EXPLAIN QUERY PLAN in both cases is

    0|0|0|SCAN TABLE attachments

  2. The negation IS NOT NULL vs. length() != 0 results in the same performance difference 250ms vs. 2ms.

  3. In combined querys that do only include {NULL} columns WHERE content IS NULL AND length(content) = 0; takes 250ms and WHERE length(content) = 0 AND content IS NULL; takes 2ms.
like image 821
Simon Warta Avatar asked Sep 29 '22 11:09

Simon Warta


1 Answers

These are simply different queries: LENGTH is a scalar function which returns (see here)

(i) NULL if the input is NULL
(ii) 0 if the input is a string of zero length (or if it is convertible to a string, resp.).

Therefore the condition length(content)=0 is true for content being an empty string, and false when content is NULL (because comparison with NULL always is false).

Based on this, I guess that your table contains several NULL fields and only a few which actually contain a value. This is supported also by your second additional info, where you say that IS NOT NULL shows a comparable performance.

like image 140
davidhigh Avatar answered Oct 02 '22 15:10

davidhigh