Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch tf-idf and ignoring field length norm in search

I would like to perform searches in elasticsearch ignoring the field-norm in the tf-idf search. You can accomplish this by ignoring the field norms by setting the index mappings. However it seems that this is accomplished by changes to the indexing, I just want to modify the search (I need the norms for other types of searches). What is the best way to accomplish this? I'm using elasticsearch.js as my interface to elasticsearch.

like image 244
user3071643 Avatar asked Jan 05 '23 17:01

user3071643


2 Answers

You can't disable norms on a per-search basis, but you can use the Multi Fields API to add an additional field where the norms are disabled.

PUT /my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "my_field": {
          "type": "string",
          "fields": {
            "no_norms": { 
              "type":  "string",
              "norms": {
                "enabled": false
              }
            }
          }
        }
      }
    }
  }
}

Now you can search on my_field if you need norms and on my_field.no_norms if you don't. You have to reindex the data in order for the new field to be available for all documents, just adding it to the mapping won't change anything for exiting docs.

like image 167
knutwalker Avatar answered Apr 16 '23 18:04

knutwalker


So this is the approach I ended up using. Instead of using tf-idf (current elasticsearch default) I used BM25 which is supposedly better. Also, it has a parameter "b" that represents the importance of field length norm. For "b=0" the field length norm is ignored while the default value is 0.75. A discussion of BM25 can be found here. Inside my elasticsearch.yml I have

index :
  similarity:
    default:
      type: BM25
      b: 0.0
      k1: 1.2
    norm_bm25:
      type: BM25
      b: 0.75
      k1: 1.2

For those who use the elasticsearch javascript api, the custom similarity can then be defined during index creation

client.indices.create({
  index: "db",
  body: { 
        settings: { 
          number_of_shards: 1,
          similarity : "norm_bm25"
        } 
  }
}
like image 22
user3071643 Avatar answered Apr 16 '23 18:04

user3071643