Consider a SQL query with the following WHERE
predicate:
... WHERE name IS NOT NULL ...
Where name
is a textual field in PostgreSQL.
No other query checks any textual property of this value, just whether it is NULL
or not. Therefore, a full btree index seems like an overkill, even though it supports this distinction:
Also, an IS NULL or IS NOT NULL condition on an index column can be used with a B-tree index.
What's the right PostgreSQL index to quickly distinguish NULL
s from non-NULL
s?
PostgreSQL will not index NULL values. This is an important point. Because an index will never include NULL values, it cannot be used to satisfy the ORDER BY clause of a query that returns all rows in a table.
To index an IS NULL condition in the Oracle database, the index must have a column that can never be NULL . That said, it is not enough that there are no NULL entries. The database has to be sure there can never be a NULL entry, otherwise the database must assume that the table has rows that are not in the index.
Here is an example of how to use the PostgreSQL IS NOT NULL condition in a SELECT statement: SELECT * FROM employees WHERE first_name IS NOT NULL; This PostgreSQL IS NOT NULL example will return all records from the employees table where the first_name does not contain a null value.
To get around the optimization of SQL queries that choose NULL column values, we can create a function-based index using the null value built-in SQL function to index only on the NULL columns.
I'm interpreting you claim that it's "overkill" in two ways: in terms of complexity (using a B-Tree instead of just a list) and space/performance.
For complexity, it's not overkill. A B-Tree index is preferable because deletes from it will be faster than some kind of "unordered" index (for lack of a better term). (An unordered index would require a full index scan just to delete.) In light of that fact, any gains from an unordered index would be usually be outweighed by the detriments, so the development effort isn't justified.
For space and performance, though, if you want a highly selective index for efficiency, you can include a WHERE
clause on an index, as noted in the fine manual:
CREATE INDEX ON my_table (name) WHERE name IS NOT NULL;
Note that you'll only see benefits from this index if it can allow PostgreSQL to ignore a large amount of rows when executing your query. E.g., if 99% of the rows have name IS NOT NULL
, the index isn't buying you anything over just letting a full table scan happen; in fact, it would be less efficient (as @CraigRinger notes) since it would require extra disk reads. If however, only 1% of rows have name IS NOT NULL
, then this represents huge savings as PostgreSQL can ignore most of the table for your query. If your table is very large, even eliminating 50% of the rows might be worth it. This is a tuning problem, and whether the index is valuable is going to depend heavily on the size and distribution of the data.
Additionally, there is very little gain in terms of space if you still need another index for the name IS NULL
rows. See Craig Ringer's answer for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With