understand azure search charFilters mapping

Question

I create my index with following custom analyzer

"analyzers":[
 {
    "name":"shinglewhite_analyzer",
    "@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
    "charFilters":[
       "map_dash"
    ],
    "tokenizer":"whitespace",
    "tokenFilters":[
        "shingle"
    ]
 }
],
"charFilters":[
 {
    "name":"map_dash",
     "@odata.type":"#Microsoft.Azure.Search.MappingCharFilter",
     "mappings":[ "_=> " ]
 }
]

The problem is that word like ice_cream from input will not match query ice cream, it matches icecream though. Can someone help me understand how this works and if I have done something wrong?

Also we'd like query "ice cream" to match "ice cream", "icecream" and "ice and cream" but favor those in order.

Yahnoosh · Accepted Answer

in order to map to a space please use the following notation (we'll update the docs to include this information):

{
    "name":"map_dash",
    "@odata.type":"#Microsoft.Azure.Search.MappingCharFilter",
    "mappings":[ "_=>\u0020" ]
}

Also, by default the shingle token filter separates tokens with a space. If you want to join subsequent tokens into one without a separator you need to customize your filter like in the following example:

{
    "name": "my_shingle",
    "@odata.type":"#Microsoft.Azure.Search.ShingleTokenFilter",
    "tokenSeparator": "" 
}

With those two changes for token ice_cream your analyzer will generate: ice, icecream, cream.

I hope that helps

understand azure search charFilters mapping

Tags:

tokenize

analyzer

azure-cognitive-search

Tony

1 Answers

Yahnoosh

Recent Activity

Donate For Us

understand azure search charFilters mapping

Tags:

tokenize

analyzer

azure-cognitive-search

Tony

1 Answers

Yahnoosh

Related questions

Recent Activity

Donate For Us