Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can't Index Only Scan be used on index created with COALESCE?

PostgreSQL 9.4 The table is created as follows:

CREATE TABLE foo (
    id integer,
    date date,
    value numeric(14,3)
);

I'm optimizing a query using the ROW_NUMBER() window function and COALESCE. For the most efficient, I tend to use Index Only Scan in the following query:

SELECT id, c_val
FROM (
    SELECT id, COALESCE(value, 0) c_val, ROW_NUMBER() OVER(PARTITION BY id ORDER BY date DESC NULLS LAST) rn
    FROM foo) sbt
WHERE sbt.rn = 1;

So, if I create the index as follows:

CREATE INDEX ON foo (id, date DESC NULLS LAST, value);

the planner choose to use Index Only Scan, but if I do it in this way:

CREATE INDEX ON foo (id, date DESC NULLS LAST, COALESCE(value, 0));

the planner will do just Index Scan.

Why? I'm trying to avoid the cost of evaluating the COALESCE function while executing the query. Why isn't it work with Index Only Scan?

like image 332
St.Antario Avatar asked Oct 31 '22 17:10

St.Antario


1 Answers

I think you wrongly assumed that COALESCE(value, 0) in your SELECT matters in terms of an index usage. To tell the truth it's only view transformation done after the rows values are returned.

What matters as far as an index usage is concerned is your WINDOW FUNCTION. Firstly you partition by id and seccondly you order values in each partition by date DESC NULLS LAST. This two things determine that index like CREATE INDEX ON foo (id, date DESC NULLS LAST, ...) is useful whatever you put in next positions. Notice that if you change order of id and date at index creation, PostgreSQL will not use the index at all.

Now, you must know that INDEX ONLY SCAN could be use only if the index itself stores the whole untouched row values requested by query. After PostgreSQL manual:

If the index stores the original indexed data values (and not some lossy representation of them), it is useful to support index-only scans, in which the index returns the actual data...

In your case your seccond index stores some lossy representation of a row because value of last column is transformed by a function and query asked for id, value and date. PostgreSQL isn't so smart to see that it's only substiturion of NULLs by 0. For him it's not the original value. So we need to access table to get the original row values (in the end using plain INDEX SCAN). After that values are formatted for output and COALESCE(values, 0) happens.

Edit:

I think that explanation is sufficient to you as far as your question about internals is concerned. To talk about COALECE() evaluation cost, I agree with a_horse_with_no_name that you probably shouldn't worry about that.

like image 198
Gabriel's Messanger Avatar answered Nov 15 '22 07:11

Gabriel's Messanger