I have a table w/ 4.5 million rows. There is no primary key. The table has a column <code>p_id</code>, type integer. There's an index, <code>idx_mytable_p_id</code> on this column using the <code>btree</code> method. I do: <pre class="prettyprint"><code>SELECT * FROM mytable WHERE p_id = 123456; </code></pre> I run an explain on this and see the following output: <pre class="prettyprint"><code>Bitmap Heap Scan on mytable (cost=12.04..1632.35 rows=425 width=321) Recheck Cond: (p_id = 543094) -> Bitmap Index Scan on idx_mytable_p_id (cost=0.00..11.93 rows=425 width=0) Index Cond: (p_id = 543094) </code></pre> Questions: <ul> <li>Why is that query doing a heap scan and then a bitmap index scan?</li> <li>Why is it examining 425 rows? Why is the width of the operation 321?</li> <li>What is the cost of 12.04..1632.35 and 0.00..11.93 telling me?</li> </ul> For the record there are 773 rows with the <code>p_id</code> value of 123456. There are 38 columns on <code>mytable</code>. Thanks!

re 1) execution plans have to be read from the inner most node to the outermost node. So it's first doing an index scan (to find the rows) and the accessing the actual table to return the rows the index scan found re 2) the number of rows shown in the plan is just an estimation based on the statistics and as such 425 vs. 773 sounds fairly reasonable. If you want to see real figures, use <code>explain analyze</code> re 3) the first number in the cost figure is the "startup" cost to intialize the step of the planner, the second cost is the total cost of that step. This is all documented in the manual: http://www.postgresql.org/docs/current/static/using-explain.html You might want to go through these links in the PostgreSQL Wiki as well: PostgreSQL EXPLAIN Using Explain

Understanding postgres explain w/ bitmap heap/index scans

Tags:

postgresql

I have a table w/ 4.5 million rows. There is no primary key. The table has a column p_id, type integer. There's an index, idx_mytable_p_id on this column using the btree method. I do:

SELECT * FROM mytable WHERE p_id = 123456;

I run an explain on this and see the following output:

Bitmap Heap Scan on mytable  (cost=12.04..1632.35 rows=425 width=321)   Recheck Cond: (p_id = 543094)   ->  Bitmap Index Scan on idx_mytable_p_id  (cost=0.00..11.93 rows=425 width=0)         Index Cond: (p_id = 543094)

Questions:

Why is that query doing a heap scan and then a bitmap index scan?
Why is it examining 425 rows? Why is the width of the operation 321?
What is the cost of 12.04..1632.35 and 0.00..11.93 telling me?

For the record there are 773 rows with the p_id value of 123456. There are 38 columns on mytable.

Thanks!

466

asked Apr 13 '12 16:04

Wells

2 Answers

Why is that query doing a heap scan and then a bitmap index scan?

It's not, exactly. EXPLAIN output shows the structure of the execution nodes, with the ones on the "higher" level (not indented as far) pulling rows from the nodes below them. So when the Bitmap Heap Scan node goes to pull its first row the Bitmap Index Scan runs to determine the set of rows to be used, and passes information on the first row to the heap scan. The index scan passes the index to determine which rows need to be read, and the heap scan actually reads them. The idea is that by reading the heap from beginning to end rather than in index order it will do less random access -- all matching rows from a given page will be read when that page is loaded, and enough pages may be read in order to use cheaper sequential access rather than seeking back and forth all over the disk.

Why is it examining 425 rows?

It's not. You ran EXPLAIN, which just shows you estimates and the chosen plan, it doesn't really examine the rows at all. That makes the value of EXPLAIN rather limited compared to running EXPLAIN ANALYZE, which actually runs the query and shows you the estimates and the actual numbers.

Why is the width of the operation 321?

Apparently that's the size, in bytes, of the tuples in mytable.

What is the cost of 12.04..1632.35 and 0.00..11.93 telling me?

The first number is the cost to return the first row from that node; the second number is the cost to return all of the rows for that node. Remember, these are estimates. The unit is an abstract cost unit. The absolute number means nothing; what matters in planning is which plan has the lowest cost. If you are using a cursor the first number matters; otherwise it is usually the second number. (I think it interpolates for a LIMIT clause.)

It is often necessary to adjust configurable cost factors, such as random_page_cost and cpu_tuple_cost, to accurately model the costs within your environment. Without such adjustments the comparative costs are likely to not match the corresponding run times, so a less-than-optimal plan might be chosen.

answered Sep 17 '22 14:09

kgrittn

re 1) execution plans have to be read from the inner most node to the outermost node. So it's first doing an index scan (to find the rows) and the accessing the actual table to return the rows the index scan found

re 2) the number of rows shown in the plan is just an estimation based on the statistics and as such 425 vs. 773 sounds fairly reasonable. If you want to see real figures, use explain analyze

re 3) the first number in the cost figure is the "startup" cost to intialize the step of the planner, the second cost is the total cost of that step.

This is all documented in the manual: http://www.postgresql.org/docs/current/static/using-explain.html

You might want to go through these links in the PostgreSQL Wiki as well:

PostgreSQL EXPLAIN
Using Explain

answered Sep 16 '22 14:09

a_horse_with_no_name

Related questions
                            
                                How to restart some progress which is stopped by "ctrl+z"?
                            
                                PostgreSQL: Create index for boolean column
                            
                                createdb: database creation failed: ERROR: permission denied to create database
                            
                                Postgres constraint ensuring one column of many is present?
                            
                                Mongodb vs Postgres in Nodejs [closed]
                            
                                A good database modeling tool for PostgreSQL? [closed]
                            
                                Which PostgreSQL column type should be used to store a Java BigDecimal?
                            
                                Deadlocks in PostgreSQL when running UPDATE
                            
                                postgres does not know where to find server configuration
                            
                                If possible how can one embed PostgreSQL?
                            
                                ActiveRecord::AdapterNotSpecified database configuration does not specify adapter
                            
                                Composite PRIMARY KEY enforces NOT NULL constraints on involved columns
                            
                                in postgres select, return a column subquery as an array?
                            
                                On Insert: column reference "score" is ambiguous
                            
                                How to allow permission to access CSV file using postgres in Ubuntu
                            
                                postgresql - integer out of range
                            
                                How to create a database with flyway?
                            
                                Postgres data type cast
                            
                                How to convert a string to timestamp in a desired timezone
                            
                                Rails: delete cascade vs dependent destroy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With