I need to do an autocomplete functionality and need to do something like this:
select field from huge_table where field like '%some string%';
The table is 2 million rows, and I need quick and several responses from it. We are using Postgres. And that kind of query will take forever.
Is there an efficient way for doing this with postgres? Or maybe should I be using some other thing than postgres?
Thanks!
If you're doing autocomplete, I'm assuming that you're looking for matches based on a prefix. The standard data-structure for prefix based lookups is a trie.
If you are unable to get adequate performance from postgres using an index, and prefix based lookup (some string%
), you can periodically do a full query of all 2 million rows, and build a trie or keep one in parallel to the database.
The worst case performance of a Trie is O(m)
, where m
is the length of your prefix, so once built it will provide for very fast autocomplete.
You could add an index to the field
being searched.
Also, if it's avoidable, don't use open ended wild cards like %some string%
they really hurt performance. If possible, do some string%
.
If you can afford the extra insert/update time maybe you can use the pg_trgm extension
You have some tests in that link with a 2 million records table to see the improvement in the best case scenario.
Depending on the specifics of your use-case, it may be worth knowing that tsquery
has a syntax for querying on prefixes of words. Combine this with a indexed tsvector
field and you can get very fast lookups of word prefixes.
Create your "huge" table:
CREATE TABLE huge_table (
field text,
field_tsv tsvector
);
Add the index:
CREATE INDEX field_tsv_idx ON huge_table USING gin(field_tsv);
Add a trigger to update the indexed column:
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON huge_table FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(field_tsv, 'pg_catalog.english', field);
Add some mock data
INSERT INTO huge_table (field) VALUES ('something nice');
INSERT INTO huge_table (field) VALUES ('another thing');
Then query for prefixes with some kind of limit:
SELECT field FROM huge_table WHERE field_tsv @@ to_tsquery('anot:*') LIMIT 20;
field
---------------
another thing
(1 row)
Read more on the docs especially on the index types as your index may get pretty large.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With