I create a table with 43kk rows, populate them with values 1..200. So ~220k per each number spreaded through the table.
create table foo (id integer primary key, val bigint);
insert into foo
select i, random() * 200 from generate_series(1, 43000000) as i;
create index val_index on foo(val);
vacuum analyze foo;
explain analyze select id from foo where val = 55;
Result: http://explain.depesz.com/s/fdsm
I expect total runtime < 1s, is it possible? I have SSD, core i5 (1,8), 4gb RAM. 9,3 Postgres.
If I use Index Only scan it works very fast:
explain analyze select val from foo where val = 55;
http://explain.depesz.com/s/7hm
But I need to select id not val so Incex Only scan is not suitable in my case.
Thanks in advance!
Additional info:
SELECT relname, relpages, reltuples::numeric, pg_size_pretty(pg_table_size(oid))
FROM pg_class WHERE oid='foo'::regclass;
Result:
"foo";236758;43800000;"1850 MB"
Config:
"cpu_index_tuple_cost";"0.005";""
"cpu_operator_cost";"0.0025";""
"cpu_tuple_cost";"0.01";""
"effective_cache_size";"16384";"8kB"
"max_connections";"100";""
"max_stack_depth";"2048";"kB"
"random_page_cost";"4";""
"seq_page_cost";"1";""
"shared_buffers";"16384";"8kB"
"temp_buffers";"1024";"8kB"
"work_mem";"204800";"kB"
PostgreSQL doesn't support BITMAP index. You can use BRIN index in some cases.
A bitmap heap scan takes a row location bitmap generated by a Bitmap Index Scan (either directly, or through a series of bitmap set operations via BitmapAnd and BitmapOr nodes) and looks up the relevant data.
Blocks are handed out one at a time, so that access to the table remains sequential. In a parallel bitmap heap scan, one process is chosen as the leader. That process performs a scan of one or more indexes and builds a bitmap indicating which table blocks need to be visited.
An index-only scan, after finding a candidate index entry, checks the visibility map bit for the corresponding heap page. If it's set, the row is known visible and so the data can be returned with no further work.
I have got answer here: http://ask.use-the-index-luke.com/questions/235/postgresql-bitmap-heap-scan-on-index-is-very-slow-but-index-only-scan-is-fast
The trick is to use composite index for id and value:
create index val_id_index on foo(val, id);
So Index Only scan will be used, but I can select id now.
select id from foo where val = 55;
Result:
http://explain.depesz.com/s/nDt3
But this works ONLY in Postgres with version 9.2+. If you have forced to use versions below try another options.
Although you're querying only 0,5% of the table, or ~10MB worth of data (out of nearly 2GB table), values of interest are spread evenly across whole table.
You can see it in the first plan you've provided:
BitmapIndexScan
completes in 123.172msBitmapHeapScan
takes 17055.046ms.You can try clustering your tables based on index order, which will put rows together on the same pages. On my SATA disks I have the following:
SET work_mem TO '300MB';
EXPLAIN (analyze,buffers) SELECT id FROM foo WHERE val = 55;
Bitmap Heap Scan on foo (...) (actual time=90.315..35091.665 rows=215022 loops=1)
Heap Blocks: exact=140489
Buffers: shared hit=20775 read=120306 written=24124
SET maintenance_work_mem TO '1GB';
CLUSTER foo USING val_index;
EXPLAIN (analyze,buffers) SELECT id FROM foo WHERE val = 55;
Bitmap Heap Scan on foo (...) (actual time=49.215..407.505 rows=215022 loops=1)
Heap Blocks: exact=1163
Buffers: shared read=1755
Of course, this is a one-time operation and it'll get longer bit-by-bit over the time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With