Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Supporting typeahead autocomplete with ElasticSearch

Is there a standard way to implement character-by-character typeahead autocomplete using ElasticSearch for small fields (e.g. place names).

(At the time of writing this, there are a number of discussions available via search, but nothing that seems definitive. (Also, I see there is talk of the effect of feature support for autocomplete/suggest in Apache Lucene 4.))

Thanks for thoughts.

like image 972
wodow Avatar asked Nov 28 '12 20:11

wodow


People also ask

How do you implement autocomplete Elasticsearch?

Autocomplete can be achieved by changing match queries to prefix queries. While match queries work on token (indexed) to token (search query tokens) match, prefix queries (as their name suggests) match all the tokens starting with search tokens, hence the number of documents (results) matched is high.

What is Elasticsearch suggester?

In the previous articles, we look into Prefix Queries and Edge NGram Tokenizer to generate search-as-you-type suggestions. Suggesters are an advanced solution in Elasticsearch to return similar looking terms based on your text input. Movie, song or job titles have a widely known or popular order.

How does type ahead search work?

Typeahead search, also known as autosuggest or autocomplete feature, is a way of filtering out the data by checking if the user input data is a subset of the data. If so, all the partially matched texts to the user are a way of providing hints when typing the text.


2 Answers

You can use Edge NGram based analyzer, see http://www.elasticsearch.org/guide/reference/index-modules/analysis/edgengram-tokenizer.html

Or use the suggest plugin: https://github.com/spinscale/elasticsearch-suggest-plugin

HTH

like image 85
dadoonet Avatar answered Sep 24 '22 03:09

dadoonet


As David wrote, you can use NGrams or the suggest plugin. With lucene 4 it will be possible to have better auto-suggestions out-of-the-box, without the need to mantain a separate index.

For now you can also just make a terms facet on your field and use a regex pattern to keep only the entries that start with the relevant prefix:

"facets" : {
    "tag" : {
        "terms" : {
            "field" : "field_name",
            "regex" : "prefix.*"
        }
    }
}

The regex is just an example, it can be improved and you can also make it case insensitive using the proper regex flag. Also, beware that making on a facet on a field that contains many unique terms is not a great idea, unless you have enough memory for it.

like image 21
javanna Avatar answered Sep 26 '22 03:09

javanna