Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Autocomplete getting data from a huge table

I need to do an autocomplete functionality and need to do something like this:

select field from huge_table where field like '%some string%';

The table is 2 million rows, and I need quick and several responses from it. We are using Postgres. And that kind of query will take forever.

Is there an efficient way for doing this with postgres? Or maybe should I be using some other thing than postgres?

Thanks!

like image 780
Guillermo Avatar asked Mar 28 '13 20:03

Guillermo


4 Answers

If you're doing autocomplete, I'm assuming that you're looking for matches based on a prefix. The standard data-structure for prefix based lookups is a trie.

If you are unable to get adequate performance from postgres using an index, and prefix based lookup (some string%), you can periodically do a full query of all 2 million rows, and build a trie or keep one in parallel to the database.

The worst case performance of a Trie is O(m), where m is the length of your prefix, so once built it will provide for very fast autocomplete.

like image 142
Wilduck Avatar answered Nov 15 '22 14:11

Wilduck


You could add an index to the field being searched.

Also, if it's avoidable, don't use open ended wild cards like %some string% they really hurt performance. If possible, do some string%.

like image 38
mattytommo Avatar answered Oct 03 '22 07:10

mattytommo


If you can afford the extra insert/update time maybe you can use the pg_trgm extension

You have some tests in that link with a 2 million records table to see the improvement in the best case scenario.

like image 2
Sergio Ayestarán Avatar answered Nov 15 '22 14:11

Sergio Ayestarán


Depending on the specifics of your use-case, it may be worth knowing that tsquery has a syntax for querying on prefixes of words. Combine this with a indexed tsvector field and you can get very fast lookups of word prefixes.

Create your "huge" table:

CREATE TABLE huge_table (
    field       text,
    field_tsv   tsvector
);

Add the index:

CREATE INDEX field_tsv_idx ON huge_table USING gin(field_tsv);

Add a trigger to update the indexed column:

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON huge_table FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(field_tsv, 'pg_catalog.english', field);

Add some mock data

INSERT INTO huge_table (field) VALUES ('something nice');
INSERT INTO huge_table (field) VALUES ('another thing');

Then query for prefixes with some kind of limit:

SELECT field FROM huge_table WHERE field_tsv @@ to_tsquery('anot:*') LIMIT 20;
     field     
---------------
 another thing
(1 row)

Read more on the docs especially on the index types as your index may get pretty large.

like image 2
Chris Farmiloe Avatar answered Nov 15 '22 15:11

Chris Farmiloe