Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case-insensitive replace in pattern_replace

I have pattern_replace token filter (es docs)

'addressPattern' => array(
                'type' => 'pattern_replace',
                'pattern' => '(str\.|street|and many more like this)',
                'replacement' => '',
            ),

How to make match case-insensitive ?

like image 311
po_taka Avatar asked Nov 14 '13 09:11

po_taka


2 Answers

Sorry that this answer is not timely, but I was searching for the problem of performing case-insenstive pattern matching in Elasticsearch. One way is you can use embedded flags:

'pattern' => '(?i)(str\.|street|and many more like this)',

An embedded flag uses the (?xyz) syntax, where xyz are flags. Other flags are 'u' for unicode case, 'm' for multiline, 's' for dotall, and more too. Usually i and s are the most useful flags, u can be useful too if working with non-English words. Note that an embedded flag is contextual - if you put it at the start of a group, it applies only within that group.

Lucene uses Java's standard library regex, so for more details refer to the java docs for util.regex.Pattern or the tutorial on java regex patterns.

like image 146
Blake Walsh Avatar answered Oct 02 '22 15:10

Blake Walsh


You can include lowercase filter in analyzer. For example:

settings: {
  analysis: {
    tokenizer: {pattern_tokenizer: {... define your tokenizer here }}
    analyzer: {
      tokenizer: 'pattern_tokenizer',
      filter: ['lowercase'], 
      ....other details...

    }
  }
}

The point is to define lowercase filter in your analyzer.

If you are using term query to match your search then you need to convert the search term to lowercase and then apply the query.

like image 45
aash Avatar answered Oct 02 '22 16:10

aash