Postges - Multi Column Index - Leading (leftmost) column

Question

I am curious about the concept of Leading column for multi-column indexes.

I'm using this sample dvdrental db.

Here's the query:

SELECT
  title,
  length,
  rating,
  replacement_cost,
  rental_rate
FROM film
WHERE length BETWEEN 60 AND 70
  AND rating = 'G';

I have two indexes that I'm playing around with:

#1

CREATE INDEX IF NOT EXISTS film_idx_length_rating
  ON film(length, rating);

#2

CREATE INDEX IF NOT EXISTS film_idx_rating_length
  ON film(rating, length);

Plan with index selected by the planner after both indexes are created:

QUERY PLAN
Bitmap Heap Scan on film  (cost=4.47..39.34 rows=15 width=34) (actual time=0.020..0.033 rows=18 loops=1)
  Recheck Cond: ((rating = 'G'::mpaa_rating) AND (length >= 60) AND (length <= 70))
  Heap Blocks: exact=14
  ->  Bitmap Index Scan on film_idx_rating_length  (cost=0.00..4.46 rows=15 width=0) (actual time=0.015..0.015 rows=18 loops=1)
        Index Cond: ((rating = 'G'::mpaa_rating) AND (length >= 60) AND (length <= 70))
Planning Time: 0.202 ms
Execution Time: 0.065 ms

Having executed the query plan EXPLAIN ANALYZE, the planner chooses the second query but the query plans from both indexes doesn't really have any significant difference only the index selected by the planner. Why is that? Why is that when rating is used as the leading column, it gets picked rather the one with length as leading column?

From the docs there's this:

A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned.

But I don't fully understand it, perhaps someone can give me an example?

Thanks!

jjanes · Accepted Answer

You will probably need to make the table a few thousand times larger than the one at that link before you start to see meaningful differences.

For the index that starts with equality column, it can jump to the 'G' section within the index, it can then jump to 60 for the length, and read forward until it exceeds 70. All of those rows will then meet both qualifications.

But for the other index, it can't just jump to 60, and then jump to the G section, because there is no a single G section. There is one G section for each distinct value between 60 and 70. So what it ends up doing is scanning all rows from 60 to 70, individually filtering out ones which are not G.

It turns out the difference is not that big, because most of the time is spent visiting the table heap to fetch the data it needs from the table, and in this case the same set of rows need to visited for either index.

Postges - Multi Column Index - Leading (leftmost) column

Tags:

postgresql

Aaron

1 Answers

jjanes

Recent Activity

Donate For Us

Postges - Multi Column Index - Leading (leftmost) column

Tags:

postgresql

Aaron

1 Answers

jjanes

Related questions

Recent Activity

Donate For Us