Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch Completion Suggester doesn't return documents on searches that match input

I have a weird problem with Elasticsearch 6.0.

I have an index with the following mapping:

{
  "cities": {
    "mappings": {
      "cities": {
        "properties": {
          "city": {
            "properties": {
              "id": {
                "type": "long"
              },
              "name": {
                "properties": {
                  "en": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  },
                  "it": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  }
                }
              },
              "slug": {
                "properties": {
                  "en": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  },
                  "it": {
                    "type": "text",
                    "fields": {
                      "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                      }
                    }
                  }
                }
              }
            }
          },
          "doctype": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "suggest": {
            "type": "completion",
            "analyzer": "accents",
            "search_analyzer": "simple",
            "preserve_separators": true,
            "preserve_position_increments": false,
            "max_input_length": 50
          },
          "weight": {
            "type": "long"
          }
        }
      }
    }
  }
}

I have these documents in my index:

{
  "_index": "cities",
  "_type": "cities",
  "_id": "991-city",
  "_version": 128,
  "found": true,
  "_source": {
    "doctype": "city",
    "suggest": {
      "input": [
        "nazaré",
        "nazare",
        "나자레",
        "najare",
        "najale",
        "ナザレ",
        "Ναζαρέ"
      ],
      "weight": 1807
    },
    "weight": 3012,
    "city": {
      "id": 991,
      "name": {
        "en": "Nazaré",
        "it": "Nazaré"
      },
      "slug": {
        "en": "nazare",
        "it": "nazare"
      }
    }
  }
}

{
  "_index": "cities",
  "_type": "cities",
  "_id": "1085-city",
  "_version": 128,
  "found": true,
  "_source": {
    "doctype": "city",
    "suggest": {
      "input": [
        "nazareth",
        "nazaret",
        "拿撒勒",
        "na sa le",
        "sa le",
        "le",
        "na-sa-lei",
        "나사렛",
        "nasares",
        "nasales",
        "ナザレス",
        "nazaresu",
        "नज़ारेथ",
        "nj'aareth",
        "aareth",
        "najaratha",
        "Назарет",
        "Ναζαρέτ",
        "názáret",
        "nazaretas"
      ],
      "weight": 1809
    },
    "weight": 3015,
    "city": {
      "id": 1085,
      "name": {
        "en": "Nazareth",
        "it": "Nazareth"
      },
      "slug": {
        "en": "nazareth",
        "it": "nazareth"
      }
    }
  }
}

Now, when I search using the suggester, with the following query:

POST /cities/_search
{
  "suggest":{
    "suggest":{
      "prefix":"nazare",
      "completion":{
        "field":"suggest"
      }
    }
  }
}

I expect to have both documents in my results, but I only get the second one (nazareth) back:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0.0,
    "hits": []
  },
  "suggest": {
    "suggest": [
      {
        "text": "nazare",
        "offset": 0,
        "length": 6,
        "options": [
          {
            "text": "nazaresu",
            "_index": "cities",
            "_type": "cities",
            "_id": "1085-city",
            "_score": 1809.0,
            "_source": {
              "doctype": "city",
              "suggest": {
                "input": [
                  "nazareth",
                  "nazaret",
                  "拿撒勒",
                  "na sa le",
                  "sa le",
                  "le",
                  "na-sa-lei",
                  "나사렛",
                  "nasares",
                  "nasales",
                  "ナザレス",
                  "nazaresu",
                  "नज़ारेथ",
                  "nj'aareth",
                  "aareth",
                  "najaratha",
                  "Назарет",
                  "Ναζαρέτ",
                  "názáret",
                  "nazaretas"
                ],
                "weight": 1809
              },
              "weight": 3015,
              "city": {
                "id": 1085,
                "name": {
                  "en": "Nazareth",
                  "it": "Nazareth"
                },
                "slug": {
                  "en": "nazareth",
                  "it": "nazareth"
                }
              }
            }
          }
        ]
      }
    ]
  }
}

This is unexpected, because in the suggester input for the first document, the term that I searched "nazare" appears exactly as I input it.

Another fun fact is that if I search for "najare" instead of "nazare" I get the correct results.

Any hint will be really appreciated!

like image 839
whites11 Avatar asked Oct 15 '22 13:10

whites11


1 Answers

For a quick solution, use the size parameter in the completion object of your query.

GET /cities/_search
{
  "suggest":{
    "suggest":{
      "prefix":"nazare",
      "completion":{
        "field":"suggest",
        "size": 100             <- HERE
      }
    }
  }
}

The size parameter default to 5, so once elasticsearch as found 5 terms (and not document) having the correct prefix, it will stop looking for more terms (and consequently documents).

This limit is per term, not per document. So if one document contains 5 terms having the correct and you use the default value of 5, then possibly the other documents will not be returned.

I strongly believe that it is whats happening in your case. The returned document has at least 5 suggest terms having the prefix nazare so only this one will be returned.

For your fun fact, when you are searching najare, there is only one term having the correct prefix, so you have the correct result.

The tricky thing is that the results depends on the order elasticsearch retrieve the documents. If the first document would have been retrieved first, it would not have reach the size threshold (only 2 or 3 prefix occurrences), the next document would be also retrieved and you would have get the correct result.

Also, unless necessary, avoid using a very high value (e.g. > 1000) for the sizeparameter. It might impact the performance particularly for short or common prefixes.

like image 146
Pierre-Nicolas Mougel Avatar answered Oct 21 '22 00:10

Pierre-Nicolas Mougel