During my reading to the book [The internals of PostgreSQL] I have been in the chapter 3
Book chapter link: https://www.interdb.jp/pg/pgsql03.html
at 3.2.2. Index Scan
specifically at 3.2.2.1. Start-Up Cost
The following query is being explained and the cost of the index is to be counted here
SELECT id, data FROM tbl WHERE data < 240;
Before estimating the cost, the numbers of the index pages and index tuples, N_index_page, N_index_tuples
testdb=# SELECT relpages, reltuples FROM pg_class WHERE relname = 'tbl_data_idx';
relpages | reltuples
----------+-----------
30 | 10000
(1 row)
Nindex,tuple = 10000
Nindex,page = 30
It is mentioned that
The start-up cost of the index scan is the cost to read the index pages to access to the first tuple in the target table, and it is defined by the following equation:
‘start_up cost
’ = {ceil(log2(Nindex,tuple)) + (Hindex + 1) × 50} × cpu_operator_cost
Where Hindex: is the height of the index tree.
Nindex,tuple is 10000; Hindex is 1; cpu_operator_cost
is 0.0025 (by default). Thus,
‘start_up cost
’ = {ceil(log2(10000)) + (1 + 1) × 50} × 0.0025 = 0.285
My question comes here at that equation What is that 50? Is it a heuristic number has a statistically meaning?
Cost size source code link (has not mentioned that 50 at all from my narrow checking) https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/costsize.c#L502-L792
I have tested that on Postgres server to make sure of the startup cost estimation and found that
test=# EXPLAIN SELECT id, data FROM tbl WHERE data < 240;
QUERY PLAN
---------------------------------------------------------------------------
Index Scan using tbl_data_idx on tbl (cost=0.29..13.47 rows=239 width=8)
Index Cond: (data < 240)
(2 rows)
Cost is 0.29 Same as (math.ceil(math.log2(10000))+(1+1)*50)*0.0025
Same as which is mentioned on the book
Use the power of the source, Luke!
That formula is taken straight from the source. See btcostestimate
in src/backend/utils/adt/selfuncs.c
:
/*
* Add a CPU-cost component to represent the costs of initial btree
* descent. We don't charge any I/O cost for touching upper btree levels,
* since they tend to stay in cache, but we still have to do about log2(N)
* comparisons to descend a btree of N leaf tuples. We charge one
* cpu_operator_cost per comparison.
*
* If there are ScalarArrayOpExprs, charge this once per SA scan. The
* ones after the first one are not startup cost so far as the overall
* plan is concerned, so add them only to "total" cost.
*/
if (index->tuples > 1) /* avoid computing log(0) */
{
descentCost = ceil(log(index->tuples) / log(2.0)) * cpu_operator_cost;
costs.indexStartupCost += descentCost;
costs.indexTotalCost += costs.num_sa_scans * descentCost;
}
/*
* Even though we're not charging I/O cost for touching upper btree pages,
* it's still reasonable to charge some CPU cost per page descended
* through. Moreover, if we had no such charge at all, bloated indexes
* would appear to have the same search cost as unbloated ones, at least
* in cases where only a single leaf page is expected to be visited. This
* cost is somewhat arbitrarily set at 50x cpu_operator_cost per page
* touched. The number of such pages is btree tree height plus one (ie,
* we charge for the leaf page too). As above, charge once per SA scan.
*/
descentCost = (index->tree_height + 1) * DEFAULT_PAGE_CPU_MULTIPLIER * cpu_operator_cost;
costs.indexStartupCost += descentCost;
So the idea is that there is some cost associated with traversing each non-leaf index page, and that cost is “arbitrarily” pegged at 50 comparison operator executions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With