Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic search or Trie for search/autocomplete?

My understanding how autocomplete/search for text/item works at high level in any scalable product like Amazon eCommerce/Google at high level was :-

Elastic Search(ES) based approach

  1. Documents are stored in DB . Once persisted given to Elastic search, It creates the index and store the index/document(based on tokenizer) in memory or disk based configuration.

  2. Once user types say 3 characters, it search all index under ES(Can be configured to index even ngram) , Rank them based on weightage and return to user

But after reading couple of resources on google like Trie based search

Looks some of the scalable product also uses Trie data stucture to do the prefix based search.

My question Is Can trie based approach be good alternative to ES or ES internally uses Trie or am i missing completely here ?

like image 310
user3198603 Avatar asked Jun 28 '18 14:06

user3198603


People also ask

Does Google Autocomplete use Trie?

Google also stores each word/sentence in the form of a trie.

Does elastic search use Trie?

Looks some of the scalable product also uses Trie data stucture to do the prefix based search.

How does Elasticsearch implement autocomplete?

Autocomplete can be achieved by changing match queries to prefix queries. While match queries work on token (indexed) to token (search query tokens) match, prefix queries (as their name suggests) match all the tokens starting with search tokens, hence the number of documents (results) matched is high.

Is elastic search fast?

Elasticsearch is fast. Because Elasticsearch is built on top of Lucene, it excels at full-text search. Elasticsearch is also a near real-time search platform, meaning the latency from the time a document is indexed until it becomes searchable is very short — typically one second.


1 Answers

ES autocompletion can be achieved in two ways:

  1. using prefix queries
  2. either using (edge-)ngrams
  3. or using the completion suggester

The first option is the poor man's completion feature. I'm mentioning it because it can be useful in certain situation but you should avoid it if you have a substantial amount of documents.

The second option uses the conventional ES indexing features, i.e. it will tokenize the text, all (edge-)ngrams will be indexed and then you can search for any prefix/infix/suffix that have been indexed.

The third option uses a different approach and is optimized for speed. Basically, when indexing a field of type completion, ES will create a "finite state transducer" and store it in memory for ultra fast access.

A finite state transducer is close to a trie in terms of implementation. You can check this excellent article which shows how trie compares to finite state transducer

UPDATE (June 25th, 2019):

ES 7.2 introduced a new data type called search_as_you_type that allows this kind of behavior natively. Read more at: https://www.elastic.co/guide/en/elasticsearch/reference/7.2/search-as-you-type.html

like image 128
Val Avatar answered Sep 19 '22 14:09

Val