I'm just beginning with ElasticSearch and trying to implement an autocomplete feature based on it.
I have an autocomplete
index with a field city
of type string
. Here's an example of a document stored into this index:
{
"_index":"autocomplete_1435797593949",
"_type":"listing",
"_id":"40716",
"_source":{
"city":"Rome",
"tags":[
"listings"
]
}
}
The analyse configuration looks like this:
{
"analyzer":{
"autocomplete_term":{
"tokenizer":"autocomplete_edge",
"filter":[
"lowercase"
]
},
"autocomplete_search":{
"tokenizer":"keyword",
"filter":[
"lowercase"
]
}
},
"tokenizer":{
"autocomplete_edge":{
"type":"nGram",
"min_gram":1,
"max_gram":100
}
}
}
The mappings:
{
"autocomplete_1435795884170":{
"mappings":{
"listing":{
"properties":{
"city":{
"type":"string",
"analyzer":"autocomplete_term"
},
}
}
}
}
}
I'm sending the following Query to ES:
{
"query":{
"multi_match":{
"query":"Rio",
"analyzer":"autocomplete_search",
"fields":[
"city"
]
}
}
}
As a result, I get the following:
{
"took":2,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":1,
"max_score":2.7742395,
"hits":[
{
"_index":"autocomplete_1435795884170",
"_type":"listing",
"_id":"53581",
"_score":2.7742395,
"_source":{
"city":"Rio",
"tags":[
"listings"
]
}
}
]
}
}
For the most part, it works. It does find the document with a city = "Rio"
before the user has to actually type the whole word ("Ri"
is enough).
And here lies my problem. I want it to return "Rio de Janeiro"
, too. To get "Rio de Janeiro"
, I need to send the following query:
{
"query":{
"multi_match":{
"query":"Rio d",
"analyzer":"standard",
"fields":[
"city"
]
}
}
}
Notice the "<whitespace>d"
there.
Another related problem is that I'd expect at least all cities that start with an "R"
to be returned with the following query:
{
"query":{
"multi_match":{
"query":"R",
"analyzer":"standard",
"fields":[
"city"
]
}
}
}
I'd expect "Rome"
, etc... (which is a document that exists in the index), however, I only get "Rio"
, again. I would like it to behave like the SQL LIKE
condition, i.e ... LIKE 'CityName%'
.
What am I doing wrong?
I would do it like this:
edge_nGram
since you said you need LIKE 'CityName%'
(meaning a prefix match): "tokenizer": {
"autocomplete_edge": {
"type": "edge_nGram",
"min_gram": 1,
"max_gram": 100
}
}
autocomplete_search
as a search_analyzer
. I think it's a good choice to have a keyword
and lowercase
: "mappings": {
"listing": {
"properties": {
"city": {
"type": "string",
"index_analyzer": "autocomplete_term",
"search_analyzer": "autocomplete_search"
}
}
}
}
{
"query": {
"multi_match": {
"query": "R",
"fields": [
"city"
]
}
}
}
The detailed explanation goes like this: split your city names in edge ngrams. For example, for Rio de Janeiro
you'll index something like:
"city": [
"r",
"ri",
"rio",
"rio ",
"rio d",
"rio de",
"rio de ",
"rio de j",
"rio de ja",
"rio de jan",
"rio de jane",
"rio de janei",
"rio de janeir",
"rio de janeiro"
]
You notice that everything is lowercased. Now, you'd want your query to take any text (lowercase or not) and to match it with what's in the index. So, an R
should match that list above.
For this to happen you want the input text to be lowercased and to be kept exactly like the user set it, meaning it shouldn't be analyzed. Why you'd want this? Because you already have split the city names in ngrams and you don't want the same for the input text. If user inputs "RI", Elasticsearch will lowercase it - ri
- and match it exactly against what it has in the index.
A probably faster alternative to multi_match
is to use a term
, but this requires your application/website to lowercase the text. The reason for this is that term
doesn't analyze the input text at all.
{
"query": {
"filtered": {
"filter": {
"term": {
"city": {
"value": "ri"
}
}
}
}
}
}
In Elasticsearch
, there is Completion Suggester
to give suggestions. Completion Suggester
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With