Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

understand azure search charFilters mapping

I create my index with following custom analyzer

"analyzers":[
 {
    "name":"shinglewhite_analyzer",
    "@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
    "charFilters":[
       "map_dash"
    ],
    "tokenizer":"whitespace",
    "tokenFilters":[
        "shingle"
    ]
 }
],
"charFilters":[
 {
    "name":"map_dash",
     "@odata.type":"#Microsoft.Azure.Search.MappingCharFilter",
     "mappings":[ "_=> " ]
 }
]

The problem is that word like ice_cream from input will not match query ice cream, it matches icecream though. Can someone help me understand how this works and if I have done something wrong?

Also we'd like query "ice cream" to match "ice cream", "icecream" and "ice and cream" but favor those in order.

like image 987
Tony Avatar asked Feb 07 '23 06:02

Tony


1 Answers

in order to map to a space please use the following notation (we'll update the docs to include this information):

{
    "name":"map_dash",
    "@odata.type":"#Microsoft.Azure.Search.MappingCharFilter",
    "mappings":[ "_=>\\u0020" ]
}         

Also, by default the shingle token filter separates tokens with a space. If you want to join subsequent tokens into one without a separator you need to customize your filter like in the following example:

{
    "name": "my_shingle",
    "@odata.type":"#Microsoft.Azure.Search.ShingleTokenFilter",
    "tokenSeparator": "" 
}

With those two changes for token ice_cream your analyzer will generate: ice, icecream, cream.

I hope that helps

like image 111
Yahnoosh Avatar answered Feb 19 '23 17:02

Yahnoosh