I'm using Postgres 9.1 and have a horribly slow performing query.
Explain Analyze SELECT COUNT(DISTINCT email) FROM "invites" WHERE (
created_at < '2012-10-10 21:08:05.259200'
AND invite_method = 'email'
AND accept_count = 0
AND reminded_count < 3
AND (last_reminded_at IS NULL OR last_reminded_at < '2012-10-10 21:08:05.261483'))
Aggregate (cost=19828.24..19828.25 rows=1 width=21) (actual time=11395.903..11395.903 rows=1 loops=1)
-> Seq Scan on invites (cost=0.00..18970.57 rows=343068 width=21) (actual time=0.036..353.121 rows=337143 loops=1)
Filter: ((created_at < '2012-10-10 21:08:05.2592'::timestamp without time zone) AND (reminded_count < 3) AND ((last_reminded_at IS NULL) OR (last_reminded_at < '2012-10-10 21:08:05.261483'::timestamp without time zone)) AND ((invite_method)::text = 'email'::text) AND (accept_count = 0))
Total runtime: 11395.970 ms
As you can see this is taking about 11 seconds. How would I go about adding an index to optimize this queries performance?
Just indexing "everything" like Jim advises is not a very efficient strategy. Indexes carry a cost to maintain and combining many individual indexes is more expensive (to maintain and to query) than one tailored index. It always depends on your complete situation.
The cost of indexes is low for read-only or rarely written tables, but high for volatile tables with lots of write operations. An additional downside is that indexes prohibit HOT-Updates (Heap Only Tuples) changing involved columns. See:
If performance of the particular query is important, a partial multi-column index would be a good strategy. Specialized, but a lot cheaper and faster than individual indexes on all involved columns. The rule of thumb is to ...
WHERE
clause to narrow down the partition of the index.Judging from your column names (for lack of information), accept_count = 0
seems to be the most selective (and stable) filter here, while created_at
and last_reminded_at
probably keep changing. So maybe something like this:
CREATE INDEX invites_special_idx
ON invites (created_at, last_reminded_at)
WHERE accept_count = 0
AND invite_method = 'email'
AND reminded_count < 3;
Sort created_at
and last_reminded_at
ascending to match the query perfectly - which happens to be the default anyway. This way, the system can get all relevant rows in a single scan from the top of the index. Should be very fast.
As we discussed in one of your previous questions, it may be of additional help to cluster the table on the index. Be sure to read the manual about CLUSTER
.
As @Craig provided, you can't CLUSTER
on a partial index. Since CLUSTER
is a one-time operation (effects degrade with later write operations) you could circumvent this restriction by creating a full index, CLUSTER
the table and drop the index again. Like:
CREATE INDEX invites_special_idx2 ON invites (created_at, last_reminded_at);
CLUSTER invites USING invites_special_idx2;
DROP INDEX invites_special_idx2;
CLUSTER
is only useful while there aren't other important queries with contradicting requirements for data distribution.
PostgreSQL 9.2 has a couple of new features that would make your query faster. In particular index-only scans (first item in the release notes). Consider upgrading.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With