Autocomplete getting data from a huge table

Question

I need to do an autocomplete functionality and need to do something like this:

select field from huge_table where field like '%some string%';

The table is 2 million rows, and I need quick and several responses from it. We are using Postgres. And that kind of query will take forever.

Is there an efficient way for doing this with postgres? Or maybe should I be using some other thing than postgres?

Thanks!

Wilduck · Accepted Answer

If you're doing autocomplete, I'm assuming that you're looking for matches based on a prefix. The standard data-structure for prefix based lookups is a trie.

If you are unable to get adequate performance from postgres using an index, and prefix based lookup (some string%), you can periodically do a full query of all 2 million rows, and build a trie or keep one in parallel to the database.

The worst case performance of a Trie is O(m), where m is the length of your prefix, so once built it will provide for very fast autocomplete.

mattytommo · Answer

You could add an index to the field being searched.

Also, if it's avoidable, don't use open ended wild cards like %some string% they really hurt performance. If possible, do some string%.

Sergio Ayestarán · Answer

If you can afford the extra insert/update time maybe you can use the pg_trgm extension

You have some tests in that link with a 2 million records table to see the improvement in the best case scenario.

Chris Farmiloe · Answer

Depending on the specifics of your use-case, it may be worth knowing that tsquery has a syntax for querying on prefixes of words. Combine this with a indexed tsvector field and you can get very fast lookups of word prefixes.

Create your "huge" table:

CREATE TABLE huge_table (
    field       text,
    field_tsv   tsvector
);

Add the index:

CREATE INDEX field_tsv_idx ON huge_table USING gin(field_tsv);

Add a trigger to update the indexed column:

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON huge_table FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(field_tsv, 'pg_catalog.english', field);

Add some mock data

INSERT INTO huge_table (field) VALUES ('something nice');
INSERT INTO huge_table (field) VALUES ('another thing');

Then query for prefixes with some kind of limit:

SELECT field FROM huge_table WHERE field_tsv @@ to_tsquery('anot:*') LIMIT 20;
     field     
---------------
 another thing
(1 row)

Read more on the docs especially on the index types as your index may get pretty large.

Autocomplete getting data from a huge table

Tags:

sql

database

autocomplete

postgresql

Guillermo

4 Answers

Wilduck

mattytommo

Sergio Ayestarán

Chris Farmiloe

Recent Activity

Donate For Us

Autocomplete getting data from a huge table

Tags:

sql

database

autocomplete

postgresql

Guillermo

4 Answers

Wilduck

mattytommo

Sergio Ayestarán

Chris Farmiloe

Related questions

Recent Activity

Donate For Us