Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using normalizer with keyword data type in elastic search giving unexpected results

I created an index as such

PUT twitter
{
  "settings": {
    "index": {
      "analysis": {
        "normalizer": {
          "caseinsensitive_exact_match_normalizer": {
            "filter": "lowercase",
            "type": "custom"
          }
        },
        "analyzer": {
          "whitespace_lowercasefilter_analyzer": {
            "filter": "lowercase",
            "char_filter": "html_strip",
            "type": "custom",
            "tokenizer": "standard"
          }
        }
      }
    }
  },

  "mappings": {
    "test" : {
      "properties": {
        "col1" : {
          "type": "keyword"
        },
        "col2" : {
          "type": "keyword",
            "normalizer": "caseinsensitive_exact_match_normalizer"
        }
      } 
    }

  }
}

then I inserted values in index as

POST twitter/test
{
  "col1" : "Dhruv",
  "col2" : "Dhruv"
}

then I query index as

GET twitter/_search
{
  "query": {
    "term": {
      "col2": {
        "value": "DHRUV"
      }
    }
  }
}

and I get the results

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "twitter",
        "_type": "test",
        "_id": "AV9yNWQb3aJEm8NgRhd_",
        "_score": 0.2876821,
        "_source": {
          "col1": "Dhruv",
          "col2": "Dhruv"
        }
      }
    ]
  }
}

as per my understaning, we should not get a result since term query ignores the analysis so it should search for DHRUVin inverted index and in index value stored should be dhruv since we used caseinsensitive_exact_match_normalizer. I am suspecting that term query doesn't ignore normalizer. Is that right?

I am using ES 5.4.1

like image 571
Dhruv Pal Avatar asked Oct 15 '25 21:10

Dhruv Pal


1 Answers

It seems it's normal for a term query to consider the normalizer when searching. But, as the issue linked previously, it's been decided this is not the expected behavior.

If you want to see what kind of query ES is rewritting yours to, you can use something like this:

GET /_validate/query?index=twitter&explain
{
  "query": {
    "term": {
      "col2": {
        "value": "DHRUV"
      }
    }
  }
}

which will show you why you get those results:

  "explanations": [
    {
      "index": "twitter",
      "valid": true,
      "explanation": "col2:dhruv"
    }
  ]
like image 188
Andrei Stefan Avatar answered Oct 18 '25 06:10

Andrei Stefan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!