Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL: Create an index to quickly distinguish NULL from non-NULL values

Consider a SQL query with the following WHERE predicate:

... WHERE name IS NOT NULL ... 

Where name is a textual field in PostgreSQL.

No other query checks any textual property of this value, just whether it is NULL or not. Therefore, a full btree index seems like an overkill, even though it supports this distinction:

Also, an IS NULL or IS NOT NULL condition on an index column can be used with a B-tree index.

What's the right PostgreSQL index to quickly distinguish NULLs from non-NULLs?

like image 633
Adam Matan Avatar asked Aug 12 '15 13:08

Adam Matan


People also ask

Can index have NULL values Postgres?

PostgreSQL will not index NULL values. This is an important point. Because an index will never include NULL values, it cannot be used to satisfy the ORDER BY clause of a query that returns all rows in a table.

IS NULL IS NOT NULL index?

To index an IS NULL condition in the Oracle database, the index must have a column that can never be NULL . That said, it is not enough that there are no NULL entries. The database has to be sure there can never be a NULL entry, otherwise the database must assume that the table has rows that are not in the index.

How do I deal with not null in PostgreSQL?

Here is an example of how to use the PostgreSQL IS NOT NULL condition in a SELECT statement: SELECT * FROM employees WHERE first_name IS NOT NULL; This PostgreSQL IS NOT NULL example will return all records from the employees table where the first_name does not contain a null value.

Can index be created on NULL columns?

To get around the optimization of SQL queries that choose NULL column values, we can create a function-based index using the null value built-in SQL function to index only on the NULL columns.


1 Answers

I'm interpreting you claim that it's "overkill" in two ways: in terms of complexity (using a B-Tree instead of just a list) and space/performance.

For complexity, it's not overkill. A B-Tree index is preferable because deletes from it will be faster than some kind of "unordered" index (for lack of a better term). (An unordered index would require a full index scan just to delete.) In light of that fact, any gains from an unordered index would be usually be outweighed by the detriments, so the development effort isn't justified.

For space and performance, though, if you want a highly selective index for efficiency, you can include a WHERE clause on an index, as noted in the fine manual:

CREATE INDEX ON my_table (name) WHERE name IS NOT NULL; 

Note that you'll only see benefits from this index if it can allow PostgreSQL to ignore a large amount of rows when executing your query. E.g., if 99% of the rows have name IS NOT NULL, the index isn't buying you anything over just letting a full table scan happen; in fact, it would be less efficient (as @CraigRinger notes) since it would require extra disk reads. If however, only 1% of rows have name IS NOT NULL, then this represents huge savings as PostgreSQL can ignore most of the table for your query. If your table is very large, even eliminating 50% of the rows might be worth it. This is a tuning problem, and whether the index is valuable is going to depend heavily on the size and distribution of the data.

Additionally, there is very little gain in terms of space if you still need another index for the name IS NULL rows. See Craig Ringer's answer for details.

like image 107
jpmc26 Avatar answered Sep 20 '22 01:09

jpmc26