I have a simple table in Postgres with a bit over 8 million rows. The column of interest holds short text strings, typically one or more words total length less than 100 characters. It is set as 'character varying (100)'. The column is indexed. A simple look up like below takes > 3000 ms.
SELECT a, b, c FROM t WHERE a LIKE '?%'
Yes, for now, the need is to simply find the rows where "a" starts with the entered text. I want to bring the speed of look up down to under 100 ms (the appearance of instantaneous). Suggestions? Seems to me that full text search won't help here as my column of text is too short, but I would be happy to try that if worthwhile.
Oh, btw I also loaded the exact same data in mongodb and indexed column "a". Loading the data in mongodb was amazingly quick (mongodb++). Both mongodb and Postgres are pretty much instantaneous when doing exact lookups. But, Postgres actually shines when doing trailing wildcard searches as above, consistently taking about 1/3 as long as mongodb. I would be happy to pursue mongodb if I could speed that up as this is only a readonly operation.
Update: First, a couple of EXPLAIN ANALYZE
outputs
EXPLAIN ANALYZE SELECT a, b, c FROM t WHERE a LIKE 'abcd%'
"Seq Scan on t (cost=0.00..282075.55 rows=802 width=40)
(actual time=1220.132..1220.132 rows=0 loops=1)"
" Filter: ((a)::text ~~ 'abcd%'::text)"
"Total runtime: 1220.153 ms"
I actually want to compare Lower(a)
with the search term which is always at least 4 characters long, so
EXPLAIN ANALYZE SELECT a, b, c FROM t WHERE Lower(a) LIKE 'abcd%'
"Seq Scan on t (cost=0.00..302680.04 rows=40612 width=40)
(actual time=4.681..3321.387 rows=788 loops=1)"
" Filter: (lower((a)::text) ~~ 'abcd%'::text)"
"Total runtime: 3321.504 ms"
So I created an index
CREATE INDEX idx_t ON t USING btree (Lower(Substring(a, 1, 4) ));
"Seq Scan on t (cost=0.00..302680.04 rows=40612 width=40)
(actual time=3243.841..3243.841 rows=0 loops=1)"
" Filter: (lower((a)::text) = 'abcd%'::text)"
"Total runtime: 3243.860 ms"
Seems the only time an index is being used is when I am looking for an exact match
EXPLAIN ANALYZE SELECT a, b, c FROM t WHERE a = 'abcd'
"Index Scan using idx_t on geonames (cost=0.00..57.89 rows=13 width=40)
(actual time=40.831..40.923 rows=17 loops=1)"
" Index Cond: ((ascii_name)::text = 'Abcd'::text)"
"Total runtime: 40.940 ms"
Found a solution by implementing an index with varchar_pattern_ops
, and am now looking for an even quicker lookups.
The PostgreSQL query planner is smart, but not an AI. To make it use an index on an expression use the exact same form of expression in the query.
With an index like this:
CREATE INDEX t_a_lower_idx ON t (lower(substring(a, 1, 4)));
Or simpler in PostgreSQL 9.1:
CREATE INDEX t_a_lower_idx ON t (lower(left(a, 4)));
Use this query:
SELECT * FROM t WHERE lower(left(a, 4)) = 'abcd';
Which is 100% functionally equivalent to:
SELECT * FROM t WHERE lower(a) LIKE 'abcd%'
Or:
SELECT * FROM t WHERE a ILIKE 'abcd%'
But not:
SELECT * FROM t WHERE a LIKE 'abcd%'
This is a functionally different query and you need a different index:
CREATE INDEX t_a_idx ON t (substring(a, 1, 4));
Or simpler with PostgreSQL 9.1:
CREATE INDEX t_a_idx ON t (left(a, 4));
And use this query:
SELECT * FROM t WHERE left(a, 4) = 'abcd';
Case insensitive. Index:
Edit: Almost forgot: If you run your db with any other locale than the default 'C', you need to specify the operator class explicitly - text_pattern_ops
in my example:
CREATE INDEX t_a_lower_idx
ON t (lower(left(a, <insert_max_length>)) text_pattern_ops);
Query:
SELECT * FROM t WHERE lower(left(a, <insert_max_length>)) ~~ 'abcdef%';
Can utilize the index and is almost as fast as the variant with a fixed length.
You may be interested in this post on dba.SE with more details about pattern matching, especially the last part about the operators ~>=~
and ~<~
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With