I have a weird problem with Elasticsearch 6.0.
I have an index with the following mapping:
{
"cities": {
"mappings": {
"cities": {
"properties": {
"city": {
"properties": {
"id": {
"type": "long"
},
"name": {
"properties": {
"en": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"it": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"slug": {
"properties": {
"en": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"it": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
},
"doctype": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"suggest": {
"type": "completion",
"analyzer": "accents",
"search_analyzer": "simple",
"preserve_separators": true,
"preserve_position_increments": false,
"max_input_length": 50
},
"weight": {
"type": "long"
}
}
}
}
}
}
I have these documents in my index:
{
"_index": "cities",
"_type": "cities",
"_id": "991-city",
"_version": 128,
"found": true,
"_source": {
"doctype": "city",
"suggest": {
"input": [
"nazaré",
"nazare",
"나자레",
"najare",
"najale",
"ナザレ",
"Ναζαρέ"
],
"weight": 1807
},
"weight": 3012,
"city": {
"id": 991,
"name": {
"en": "Nazaré",
"it": "Nazaré"
},
"slug": {
"en": "nazare",
"it": "nazare"
}
}
}
}
{
"_index": "cities",
"_type": "cities",
"_id": "1085-city",
"_version": 128,
"found": true,
"_source": {
"doctype": "city",
"suggest": {
"input": [
"nazareth",
"nazaret",
"拿撒勒",
"na sa le",
"sa le",
"le",
"na-sa-lei",
"나사렛",
"nasares",
"nasales",
"ナザレス",
"nazaresu",
"नज़ारेथ",
"nj'aareth",
"aareth",
"najaratha",
"Назарет",
"Ναζαρέτ",
"názáret",
"nazaretas"
],
"weight": 1809
},
"weight": 3015,
"city": {
"id": 1085,
"name": {
"en": "Nazareth",
"it": "Nazareth"
},
"slug": {
"en": "nazareth",
"it": "nazareth"
}
}
}
}
Now, when I search using the suggester, with the following query:
POST /cities/_search
{
"suggest":{
"suggest":{
"prefix":"nazare",
"completion":{
"field":"suggest"
}
}
}
}
I expect to have both documents in my results, but I only get the second one (nazareth) back:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0.0,
"hits": []
},
"suggest": {
"suggest": [
{
"text": "nazare",
"offset": 0,
"length": 6,
"options": [
{
"text": "nazaresu",
"_index": "cities",
"_type": "cities",
"_id": "1085-city",
"_score": 1809.0,
"_source": {
"doctype": "city",
"suggest": {
"input": [
"nazareth",
"nazaret",
"拿撒勒",
"na sa le",
"sa le",
"le",
"na-sa-lei",
"나사렛",
"nasares",
"nasales",
"ナザレス",
"nazaresu",
"नज़ारेथ",
"nj'aareth",
"aareth",
"najaratha",
"Назарет",
"Ναζαρέτ",
"názáret",
"nazaretas"
],
"weight": 1809
},
"weight": 3015,
"city": {
"id": 1085,
"name": {
"en": "Nazareth",
"it": "Nazareth"
},
"slug": {
"en": "nazareth",
"it": "nazareth"
}
}
}
}
]
}
]
}
}
This is unexpected, because in the suggester input for the first document, the term that I searched "nazare" appears exactly as I input it.
Another fun fact is that if I search for "najare" instead of "nazare" I get the correct results.
Any hint will be really appreciated!
For a quick solution, use the size
parameter in the completion
object of your query.
GET /cities/_search
{
"suggest":{
"suggest":{
"prefix":"nazare",
"completion":{
"field":"suggest",
"size": 100 <- HERE
}
}
}
}
The size parameter default to 5, so once elasticsearch as found 5 terms (and not document) having the correct prefix, it will stop looking for more terms (and consequently documents).
This limit is per term, not per document. So if one document contains 5 terms having the correct and you use the default value of 5, then possibly the other documents will not be returned.
I strongly believe that it is whats happening in your case. The returned document has at least 5 suggest terms having the prefix nazare
so only this one will be returned.
For your fun fact, when you are searching najare
, there is only one term having the correct prefix, so you have the correct result.
The tricky thing is that the results depends on the order elasticsearch retrieve the documents. If the first document would have been retrieved first, it would not have reach the size
threshold (only 2 or 3 prefix occurrences), the next document would be also retrieved and you would have get the correct result.
Also, unless necessary, avoid using a very high value (e.g. > 1000) for the size
parameter. It might impact the performance particularly for short or common prefixes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With