Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing nulls for fast searching on DB2

It's my understanding that nulls are not indexable in DB2, so assuming we have a huge table (Sales) with a date column (sold_on) which is normally a date, but is occasionally (10% of the time) null.

Furthermore, let's assume that it's a legacy application that we can't change, so those nulls are staying there and mean something (let's say sales that were returned).

We can make the following query fast by putting an index on the sold_on and total columns

Select * from Sales 
where 
Sales.sold_on between date1 and date2
and Sales.total = 9.99

But an index won't make this query any faster:

Select * from Sales 
where 
Sales.sold_on is null
and Sales.total = 9.99

Because the indexing is done on the value.

Can I index nulls? Maybe by changing the index type? Indexing the indicator column?

like image 413
Dave Avatar asked Dec 31 '22 10:12

Dave


2 Answers

From where did you get the impression that DB2 doesn't index NULLs? I can't find anything in documentation or articles supporting the claim. And I just performed a query in a large table using a IS NULL restriction involving an indexed column containing a small fraction of NULLs; in this case, DB2 certainly used the index (verified by an EXPLAIN, and by observing that the database responded instantly instead of spending time to perform a table scan).

So: I claim that DB2 has no problem with NULLs in non-primary key indexes.

But as others have written: Your data may be composed in a way where DB2 thinks that using an index will not be quicker. Or the database's statistics aren't up-to-date for the involved table(s).

like image 51
Troels Arvin Avatar answered Jan 01 '23 22:01

Troels Arvin


I'm no DB2 expert, but if 10% of your values are null, I don't think an index on that column alone will ever help your query. 10% is too many to bother using an index for -- it'll just do a table scan. If you were talking about 2-3%, I think it would actually use your index.

Think about how many records are on a page/block -- say 20. The reason to use an index is to avoid fetching pages you don't need. The odds that a given page will contain 0 records that are null is (90%)^20, or 12%. Those aren't good odds -- you're going to need 88% of your pages to be fetched anyway, using the index isn't very helpful.

If, however, your select clause only included a few columns (and not *) -- say just salesid, you could probably get it to use an index on (sold_on,salesid), as the read of the data page wouldn't be needed -- all the data would be in the index.

like image 34
Jonathan Rupp Avatar answered Jan 01 '23 22:01

Jonathan Rupp