Is there a standard way to implement character-by-character typeahead autocomplete using ElasticSearch for small fields (e.g. place names).
(At the time of writing this, there are a number of discussions available via search, but nothing that seems definitive. (Also, I see there is talk of the effect of feature support for autocomplete/suggest in Apache Lucene 4.))
Thanks for thoughts.
Autocomplete can be achieved by changing match queries to prefix queries. While match queries work on token (indexed) to token (search query tokens) match, prefix queries (as their name suggests) match all the tokens starting with search tokens, hence the number of documents (results) matched is high.
In the previous articles, we look into Prefix Queries and Edge NGram Tokenizer to generate search-as-you-type suggestions. Suggesters are an advanced solution in Elasticsearch to return similar looking terms based on your text input. Movie, song or job titles have a widely known or popular order.
Typeahead search, also known as autosuggest or autocomplete feature, is a way of filtering out the data by checking if the user input data is a subset of the data. If so, all the partially matched texts to the user are a way of providing hints when typing the text.
You can use Edge NGram based analyzer, see http://www.elasticsearch.org/guide/reference/index-modules/analysis/edgengram-tokenizer.html
Or use the suggest plugin: https://github.com/spinscale/elasticsearch-suggest-plugin
HTH
As David wrote, you can use NGrams or the suggest plugin. With lucene 4 it will be possible to have better auto-suggestions out-of-the-box, without the need to mantain a separate index.
For now you can also just make a terms facet on your field and use a regex pattern to keep only the entries that start with the relevant prefix:
"facets" : {
"tag" : {
"terms" : {
"field" : "field_name",
"regex" : "prefix.*"
}
}
}
The regex is just an example, it can be improved and you can also make it case insensitive using the proper regex flag. Also, beware that making on a facet on a field that contains many unique terms is not a great idea, unless you have enough memory for it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With