Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic Search Interaction of Highlights with Synonym Filter

We have an analyzer which includes the synonym filter which is defined as follows:

        synonym_filter :
            type : synonym
            synonyms_path : synonyms.txt
            ignore_case : true
            expand : true
            format : solr

In the synonym file we have a synonym defined as follows:

dawdle,waste time

Then in our data we have an entity with a name field "dawdle company".

Because of the synonym filter this gets analyzed to something like:

1 -dawdle- 2 -company- 3
1 -wasted- 2 -time- 3

With time and company in the same position. Then when performing a search for "wasted time" we get a hit in this entity. We would like the highlights to be "dawdle" since that is the equivalent synonym, but it seems elastic search sees this as a two hits since it matched "wasted" and "time" and it returns two highlights: "dawdle" and "company".

Is there a recommended way to solve these kind of issues where an unexpected word is returned in the highlights because it occupies the same position of a search term that was inserted because of a synonym?

like image 852
user2430530 Avatar asked Sep 13 '13 21:09

user2430530


1 Answers

@SergeyS the situation both you and @user2430530 has is perfectly described in this section of the documentation.

And the suggestion there is to try and define a single term for each serie of synonyms not to get back that mix up of terms highlighted in the result.

Something like this:

"analysis": {
  "analyzer": {
    "synonym": {
      "tokenizer": "whitespace",
      "filter": [
        "synonym"
      ]
    }
  },
  "filter": {
    "synonym": {
      "type": "synonym",
      "synonyms": [
        "dawdle, waste time=>waste_time"
      ]
    }
  }
}

Then you'll get the desired result from ES:

        "highlight": {
           "text": [
              "some <em>dawdle</em> company"
           ]
        }
like image 140
Andrei Stefan Avatar answered Sep 22 '22 18:09

Andrei Stefan