I have pattern_replace token filter (es docs)
'addressPattern' => array(
'type' => 'pattern_replace',
'pattern' => '(str\.|street|and many more like this)',
'replacement' => '',
),
How to make match case-insensitive ?
Sorry that this answer is not timely, but I was searching for the problem of performing case-insenstive pattern matching in Elasticsearch. One way is you can use embedded flags:
'pattern' => '(?i)(str\.|street|and many more like this)',
An embedded flag uses the (?xyz) syntax, where xyz are flags. Other flags are 'u' for unicode case, 'm' for multiline, 's' for dotall, and more too. Usually i and s are the most useful flags, u can be useful too if working with non-English words. Note that an embedded flag is contextual - if you put it at the start of a group, it applies only within that group.
Lucene uses Java's standard library regex, so for more details refer to the java docs for util.regex.Pattern or the tutorial on java regex patterns.
You can include lowercase filter
in analyzer
. For example:
settings: {
analysis: {
tokenizer: {pattern_tokenizer: {... define your tokenizer here }}
analyzer: {
tokenizer: 'pattern_tokenizer',
filter: ['lowercase'],
....other details...
}
}
}
The point is to define lowercase
filter in your analyzer.
If you are using term
query to match your search then you need to convert the search term to lowercase and then apply the query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With