PostgreSQL 9.4 The table is created as follows:
CREATE TABLE foo (
id integer,
date date,
value numeric(14,3)
);
I'm optimizing a query using the ROW_NUMBER()
window function and COALESCE
. For the most efficient, I tend to use Index Only Scan
in the following query:
SELECT id, c_val
FROM (
SELECT id, COALESCE(value, 0) c_val, ROW_NUMBER() OVER(PARTITION BY id ORDER BY date DESC NULLS LAST) rn
FROM foo) sbt
WHERE sbt.rn = 1;
So, if I create the index as follows:
CREATE INDEX ON foo (id, date DESC NULLS LAST, value);
the planner choose to use Index Only Scan
, but if I do it in this way:
CREATE INDEX ON foo (id, date DESC NULLS LAST, COALESCE(value, 0));
the planner will do just Index Scan
.
Why? I'm trying to avoid the cost of evaluating the COALESCE
function while executing the query. Why isn't it work with Index Only Scan
?
I think you wrongly assumed that COALESCE(value, 0)
in your SELECT
matters in terms of an index usage. To tell the truth it's only view transformation done after the rows values are returned.
What matters as far as an index usage is concerned is your WINDOW FUNCTION
. Firstly you partition by id
and seccondly you order values in each partition by date DESC NULLS LAST
. This two things determine that index like CREATE INDEX ON foo (id, date DESC NULLS LAST, ...)
is useful whatever you put in next positions. Notice that if you change order of id
and date
at index creation, PostgreSQL will not use the index at all.
Now, you must know that INDEX ONLY SCAN
could be use only if the index itself stores the whole untouched row values requested by query. After PostgreSQL manual:
If the index stores the original indexed data values (and not some lossy representation of them), it is useful to support index-only scans, in which the index returns the actual data...
In your case your seccond index stores some lossy representation of a row because value of last column is transformed by a function and query asked for id
, value
and date
. PostgreSQL isn't so smart to see that it's only substiturion of NULLs
by 0
. For him it's not the original value. So we need to access table to get the original row values (in the end using plain INDEX SCAN
). After that values are formatted for output and COALESCE(values, 0)
happens.
Edit:
I think that explanation is sufficient to you as far as your question about internals is concerned. To talk about COALECE()
evaluation cost, I agree with a_horse_with_no_name that you probably shouldn't worry about that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With