I create my index with following custom analyzer
"analyzers":[
{
"name":"shinglewhite_analyzer",
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"charFilters":[
"map_dash"
],
"tokenizer":"whitespace",
"tokenFilters":[
"shingle"
]
}
],
"charFilters":[
{
"name":"map_dash",
"@odata.type":"#Microsoft.Azure.Search.MappingCharFilter",
"mappings":[ "_=> " ]
}
]
The problem is that word like ice_cream from input will not match query ice cream, it matches icecream though. Can someone help me understand how this works and if I have done something wrong?
Also we'd like query "ice cream" to match "ice cream", "icecream" and "ice and cream" but favor those in order.
in order to map to a space please use the following notation (we'll update the docs to include this information):
{
"name":"map_dash",
"@odata.type":"#Microsoft.Azure.Search.MappingCharFilter",
"mappings":[ "_=>\\u0020" ]
}
Also, by default the shingle token filter separates tokens with a space. If you want to join subsequent tokens into one without a separator you need to customize your filter like in the following example:
{
"name": "my_shingle",
"@odata.type":"#Microsoft.Azure.Search.ShingleTokenFilter",
"tokenSeparator": ""
}
With those two changes for token ice_cream your analyzer will generate: ice, icecream, cream.
I hope that helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With