Case-insensitive replace in pattern_replace

Question

I have pattern_replace token filter (es docs)

'addressPattern' => array(
                'type' => 'pattern_replace',
                'pattern' => '(str\.|street|and many more like this)',
                'replacement' => '',
            ),

How to make match case-insensitive ?

Blake Walsh · Accepted Answer

Sorry that this answer is not timely, but I was searching for the problem of performing case-insenstive pattern matching in Elasticsearch. One way is you can use embedded flags:

'pattern' => '(?i)(str\.|street|and many more like this)',

An embedded flag uses the (?xyz) syntax, where xyz are flags. Other flags are 'u' for unicode case, 'm' for multiline, 's' for dotall, and more too. Usually i and s are the most useful flags, u can be useful too if working with non-English words. Note that an embedded flag is contextual - if you put it at the start of a group, it applies only within that group.

Lucene uses Java's standard library regex, so for more details refer to the java docs for util.regex.Pattern or the tutorial on java regex patterns.

aash · Answer

You can include lowercase filter in analyzer. For example:

settings: {
  analysis: {
    tokenizer: {pattern_tokenizer: {... define your tokenizer here }}
    analyzer: {
      tokenizer: 'pattern_tokenizer',
      filter: ['lowercase'], 
      ....other details...

    }
  }
}

The point is to define lowercase filter in your analyzer.

If you are using term query to match your search then you need to convert the search term to lowercase and then apply the query.

Case-insensitive replace in pattern_replace

Tags:

lucene

elasticsearch

po_taka

2 Answers

Blake Walsh

aash

Recent Activity

Donate For Us

Case-insensitive replace in pattern_replace

Tags:

lucene

elasticsearch

po_taka

2 Answers

Blake Walsh

aash

Related questions

Recent Activity

Donate For Us