Word boundary in Lucene regex

1 Answers

In ElasticSearch regex flavor, there is no direct equivalent to a word boundary. Initial \b is something like (^|[^A-Za-z0-9_]) if the word starts with a word char, and the trailing \b is like ($|[^A-Za-z0-9_]) if the word ends with a word char.

Thus, we need to make sure that there is a non-word char before and after word or start/end of string. Since the regex is anchored by default, all we need to make [^A-Za-z0-9_] optional at start/end of string is add .* beside and wrap with an optional grouping construct:

(.*[^A-Za-z0-9_])?word([^A-Za-z0-9_].*)?

Details

(.*[^A-Za-z0-9_])? - either start of string or any 0+ chars (but a line break char, else use (.|\n)*) and then any char but a word char (basically, it is start of string followed with 1 or 0 occurrences of the pattern inside the group)
word - a word
([^A-Za-z0-9_].*)? - an optional sequence of any char but a word char followed with any 0+ chars, followed by the end of string position (implicit in Lucene regex).

200

answered Sep 19 '22 04:09

Wiktor Stribiżew

Related questions
                            
                                match ascii characters except few characters
                            
                                Replace special characters (dash)
                            
                                "reverse" regular expression with JavaScript(node.js)
                            
                                single js regex for matching repeating substrings?
                            
                                ack - search for multiple patterns (logical AND)
                            
                                Most efficient way to split a string up by a delimiter while ignoring certain instances of said delimiter using excel vba
                            
                                Which characters are allowed in hashtags
                            
                                How to match both numbers and range of numbers in a CSV-like string with regex?
                            
                                Negative lookbehind alternative
                            
                                Regex matching in jinja2 filters (for use in saltstack)
                            
                                Ansible replace and brackets "["
                            
                                How to extract Cookie Data from Jmeter Request
                            
                                How do I refer to a regex group inside a custom grok pattern?
                            
                                Why does python's re.search method hang?
                            
                                string regex replace in node js
                            
                                C# Regex: matching anything between single quotes (except single quotes) [duplicate]
                            
                                How to capture and replace strings with regex in Notepad++
                            
                                Regular expression: matching words between white space
                            
                                Replacing a string with a capture group in Swift 3
                            
                                Perl 6 Grammar doesn't match like I think it should

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Word boundary in Lucene regex

Tags:

regex

lucene

elasticsearch

dimid

People also ask

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us