Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are both SELECT count(PK) and SELECT count(*) so slow?

I've got a simple table with single column PRIMARY KEY called id, type serial. There is exactly 100,000,000 rows in there. Table takes 48GB, PK index ca 2,1GB. Machine running on is "dedicated" only for Postgres and it is something like Core i5, 500GB HDD, 8GB RAM. Pg config was created by pgtune utility (shared buffers ca 2GB, effective cache ca 7GB). OS is Ubuntu server 14.04, Postgres 9.3.6.

Why are both SELECT count(id) and SELECT count(*) so slow in this simple case (cca 11 minutes)?

Why is PostgreSQL planner choosing full table scan instead of index scanning which should be at least 25 times faster (in the case where it would have to read the whole index from HDD). Or where am I wrong?

Btw running the query several times in a row is not changing anything. still cca 11 minutes.

Execution plan here:

 Aggregate  (cost=7500001.00..7500001.01 rows=1 width=0) (actual time=698316.978..698316.979 rows=1 loops=1)
   Buffers: shared hit=192 read=6249809
   ->  Seq Scan on transaction  (cost=0.00..7250001.00 rows=100000000 width=0) (actual time=0.009..680594.049 rows=100000001 loops=1)
         Buffers: shared hit=192 read=6249809
 Total runtime: 698317.044 ms
like image 444
Kousalik Avatar asked May 19 '15 19:05

Kousalik


1 Answers

Considering the spec of a HDD is usually somewhere between 50Mb/s and 100Mb/s then for 48Gb I would expect to read everything between 500 and 1000s.

Since you have no where clause, the planner sees that you are interested in the large majority of records, so it does not use the index as this would require additional indexes. The reason postgresql cannot use the index lies in the MVCC which postgresql uses for transaction consistency. This requires that the rows are pulled to ensure accurate results. (see https://wiki.postgresql.org/wiki/Slow_Counting)

The cache, CPU, etc will not affect this nor changing the caching settings. This is IO bound and the cache will be completely trashed after the query.

If you can live with an approximation you can use the reltuples field in the table metadata:

SELECT reltuples FROM pg_class WHERE relname = 'tbl';

Since this is just a single row this is blazing fast.

Update: since 9.2 a new way to store the visibility information allowed index-only counts to happen. However there are quite some caveats, especially in the case where there is no predicate to limit the rows. see https://wiki.postgresql.org/wiki/Index-only_scans for more details.

like image 52
Peter Tillemans Avatar answered Nov 09 '22 00:11

Peter Tillemans