I have the following table/indexes -
CREATE TABLE test
(
coords geography(Point,4326),
user_id varchar(50),
created_at timestamp
);
CREATE INDEX ix_coords ON test USING GIST (coords);
CREATE INDEX ix_user_id ON test (user_id);
CREATE INDEX ix_created_at ON test (created_at DESC);
This is the query I want to execute:
select *
from updates
where ST_DWithin(coords, ST_MakePoint(-126.4, 45.32)::geography, 30000)
and user_id='3212312'
order by created_at desc
limit 60
When I run the query it only uses ix_coords
index. How can I ensure that Postgres uses ix_user_id
and ix_created_at
index as well for the query?
This is a new table in which I did bulk insert of production data. Total rows in the test
table: 15,069,489
I am running PostgreSQL 9.2.1 (with Postgis) with (effective_cache_size = 2GB). This is my local OSX with 16GB RAM, Core i7/2.5 GHz, non-SSD disk.
Adding the EXPLAIN ANALYZE
output -
Limit (cost=71.64..71.65 rows=1 width=280) (actual time=1278.652..1278.665 rows=60 loops=1)
-> Sort (cost=71.64..71.65 rows=1 width=280) (actual time=1278.651..1278.662 rows=60 loops=1)
Sort Key: created_at
Sort Method: top-N heapsort Memory: 33kB
-> Index Scan using ix_coords on test (cost=0.00..71.63 rows=1 width=280) (actual time=0.198..1278.227 rows=178 loops=1)
Index Cond: (coords && '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography)
Filter: (((user_id)::text = '4f1092000b921a000100015c'::text) AND ('0101000020E61000006666666666E63C40C3F5285C8F824440'::geography && _st_expand(coords, 30000::double precision)) AND _st_dwithin(coords, '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography, 30000::double precision, true))
Rows Removed by Filter: 3122459
Total runtime: 1278.701 ms
UPDATE:
Based on the suggestions below I tried index on cords + user_id:
CREATE INDEX ix_coords_and_user_id ON updates USING GIST (coords, user_id);
..but get the following error:
ERROR: data type character varying has no default operator class for access method "gist"
HINT: You must specify an operator class for the index or define a default operator class for the data type.
UPDATE:
So the CREATE EXTENSION btree_gist;
solved the btree/gist compound index issue. And now my index looks like
CREATE INDEX ix_coords_user_id_created_at ON test USING GIST (coords, user_id, created_at);
NOTE: btree_gist does not accept DESC/ASC.
New query plan:
Limit (cost=134.99..135.00 rows=1 width=280) (actual time=273.282..273.292 rows=60 loops=1)
-> Sort (cost=134.99..135.00 rows=1 width=280) (actual time=273.281..273.285 rows=60 loops=1)
Sort Key: created_at
Sort Method: quicksort Memory: 41kB
-> Index Scan using ix_updates_coords_user_id_created_at on updates (cost=0.00..134.98 rows=1 width=280) (actual time=0.406..273.110 rows=115 loops=1)
Index Cond: ((coords && '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography) AND ((user_id)::text = '4e952bb5b9a77200010019ad'::text))
Filter: (('0101000020E61000006666666666E63C40C3F5285C8F824440'::geography && _st_expand(coords, 30000::double precision)) AND _st_dwithin(coords, '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography, 30000::double precision, true))
Rows Removed by Filter: 1
Total runtime: 273.331 ms
The query is performing better than before, almost a second better but still not great. I guess this is the best that I can get?? I was hoping somewhere around 60-80ms. Also taking order by created_at desc
from the query, shaves off another 100ms, meaning it is unable to use the index. Anyway to fix this?
I don't know if Pg can combine a GiST index and regular b-tree indexes with a bitmap index scan, but I suspect not. You may be getting the best result you can without adding a user_id
column to your GiST index (and consequently making it bigger and slower for other queries that don't use user_id
).
As an experiment you could:
CREATE EXTENSION btree_gist;
CREATE INDEX ix_coords_and_user_id ON test USING GIST (coords, user_id);
which is likely to result in a big index, but might boost that query - if it works. Be aware that maintaining such an index will significantly slow INSERT
and UPDATE
s. If you drop the old ix_coords
your queries will use ix_coords_and_user_id
even if they don't filter on user_id
, but it'll be slower than ix_coords
. Keeping both will make the INSERT
and UPDATE
slowdown even worse.
See btree-gist
(Obsoleted by edit to question that changes the question completely; when written the user had a multicolumn index they've now split into two separate ones):
You don't seem to be filtering or sorting on user_id
, only create_date
. Pg won't (can't?) use only the second term of a multi-column index like (user_id, create_date)
, it needs use of the first item too.
If you want to index create_date
, create a separate index for it. If you use and need the (user_id, create_date)
index and don't generally use just user_id
alone, see if you can reverse the column order. Alternately create two independent indexes, (user_id)
and (create_date)
. When both columns are needed Pg can combine the two indepependent indexes using a bitmap index scan.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With