Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to highlight ngram tokens in a word using elastic search

I would like to highlight just the ngrams which match, not the whole word. Example:

term: "Wo"
highlight should be: "<em>Wo</em>nderfull world!"
currently it is: "<em>Wonderfull</em> world!"

Mapping is:

{
  "global_search_1495732922733" : {
    "mappings" : {
      "meeting" : {
        "properties" : {
        ...
          "name" : {
            "type" : "text",
            "analyzer" : "meeteor_index_analyzer",
            "search_analyzer" : "meeteor_search_term_analyzer"
          },
          ...
        }
      }
    }
  }
}

Analyzers are:

"analysis" : {
  "filter" : {
    "meeteor_stemmer" : {
      "name" : "english",
      "type" : "stemmer"
    },
    "meeteor_ngram" : {
      "type" : "nGram",
      "min_gram" : "2",
      "max_gram" : "15"
    }
  },
  "analyzer" : {
    "meeteor_search_term_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding"
      ],
      "tokenizer" : "standard"
    },
    "meeteor_index_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding",
        "meeteor_ngram"
      ],
      "tokenizer" : "standard"
    },
    "meeteor_project_id_analyzer" : {
      "tokenizer" : "standard"
    }
  }
},

Concrete example:

curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
    "query": {
        "match": {
            "name": "Me"
        }
    },
    "highlight":{
      "fields": {
        "name": {}
      }
    }
}
'

The result is:

 "...highlight" : {
          "name" : [
            "Sad <em>Meeting</em>"
          ]
        }
like image 895
Boti Avatar asked May 26 '17 15:05

Boti


1 Answers

The correct way to achieve what you want is using ngram as tokenizer and not filter. You can do something like this:

"analysis" : {
  "filter" : {
    "meeteor_stemmer" : {
      "name" : "english",
      "type" : "stemmer"
    }
  },
  "tokenizer" : {
    "meeteor_ngram_tokenizer" : {
      "type" : "nGram",
      "min_gram" : "2",
      "max_gram" : "15"
    }
  },
  "analyzer" : {
    "meeteor_search_term_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding"
      ],
      "tokenizer" : "standard"
    },
    "meeteor_index_analyzer" : {
      "filter" : [
        "lowercase",
        "asciifolding"
      ],
      "tokenizer" : "meeteor_ngram_tokenizer"
    },
    "meeteor_project_id_analyzer" : {
      "tokenizer" : "standard"
    }
  }
},

It will generate the highlighting by ngram for you like this:

 "...highlight" : {
          "name" : [
            "Sad <em>Me</em>eting"
          ]
        }
like image 134
Bruno dos Santos Avatar answered Nov 11 '22 20:11

Bruno dos Santos