Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Bitmap Scan faster than Index Scan when fetching a moderately large percentage of the table in PostgreSQL?

The author of Bitmap Scan described the difference between Bitmap Heap Scan and Index Scan:

A plain indexscan fetches one tuple-pointer at a time from the index, and immediately visits that tuple in the table. A bitmap scan fetches all the tuple-pointers from the index in one go, sorts them using an in-memory "bitmap" data structure, and then visits the table tuples in physical tuple-location order. The bitmap scan improves locality of reference to the table at the cost of more bookkeeping overhead to manage the "bitmap" data structure --- and at the cost that the data is no longer retrieved in index order, which doesn't matter for your query but would matter if you said ORDER BY.

Questions:

  1. Why does it sort the fetched tuple-pointers again, when the index is already sorted?

  2. How does it sort with bitmap? I know what bitmap is, but I don't understand how it can be used for sorting.

  3. Why is it faster than Index Scan when fetching a moderately large percentage of the table? On the contrary, it seems to add quite a bit of computing to the process.

like image 974
netok Avatar asked Apr 12 '19 11:04

netok


People also ask

Why use a bitmap scan in PostgreSQL?

If you only select a handful of rows, PostgreSQL will decide on an index scan – if you select a majority of the rows, PostgreSQL will decide to read the table completely. But what if you read too much for an index scan to be efficient but too little for a sequential scan? The solution to the problem is to use a bitmap scan.

How does PostgreSQL indexing work?

PostgreSQL will first scan the index and compile those rows / blocks, which are needed at the end of the scan. Then PostgreSQL will take this list and go to the table to really fetch those rows. The beauty is that this mechanism even works if you are using more than just one index.

What is a bitmap scan and how do I use it?

The solution to the problem is to use a bitmap scan. The idea behind a bitmap scan is that a single block is only used once during a scan. It can also be very helpful if you want to use more than one index to scan a single table. PostgreSQL will first scan the index and compile those rows / blocks, which are needed at the end of the scan.

Is a sequential scan in PostgreSQL always bad?

PostgreSQL will filter out those unnecessary rows and just return the rest. This is really the best thing to do in this case. A sequential scan is therefore not always bad – there are use cases, where a sequential scan is actually perfect. Still: Keep in mind that scanning large tables sequentially too often will take its toll at some point.


1 Answers

The backbone of Postgres storage is made up of data pages of 8 kilobytes in a typical installation. Each data page typically holds many tuples. Read the details of physical storage in the manual.

The "bitmap" in a bitmap scan is a way to collect tuple pointers in buckets of data pages. Index sort order is necessarily lost in this process in favor of physical sort order. In "lossy mode" (which only occurs if the result is so huge or workmem so small that even the tiny bitmap won't fit) only block numbers are kept and corresponding tuple indexes discarded.

After that, each data page is only visited once from storage to extract (possibly) multiple tuples and in physical sequence, which also matters for some types of storage. In lossy mode, tuples from each identified page have to be filtered by rechecking the index condition; else tuples can be retrieved directly using collected tuple indexes.

In an index scan, each page may have to be visited multiple times if multiple tuples end up to be stored in the same data page. The actual process is even more sophisticated. Related:

  • Postgres not using index when index scan is much better option
  • Index usage on a temporary table
  • “Recheck Cond:” line in query plans with a bitmap index scan
  • How do I decompose ctid into page and row numbers?
  • Keep PostgreSQL from sometimes choosing a bad query plan

To your questions:

  1. The sorting of the index is lost due to collecting hits and reading them out data page by data page.

  2. Hence, the result has to be sorted again, in an added sort step - if the sort order is required (by ORDER BY, for instance).

  3. The number of data pages that have to be read is the most prominent factor for overall performance. Bitmap index scans reduce that number to a minimum. With faster storage, the benefit of bitmap index scans gets smaller. That's why accurate cost settings are crucial for the query planner to make good decisions.

like image 123
Erwin Brandstetter Avatar answered Oct 10 '22 16:10

Erwin Brandstetter