I am having a problem with matching word boundaries with REGEXP_LIKE. The following query returns a single row, as expected. <pre class="prettyprint"><code>select 1 from dual where regexp_like('DOES TEST WORK HERE','TEST'); </code></pre> But I want to match on word boundaries as well. So, adding the "\b" characters gives this query <pre class="prettyprint"><code>select 1 from dual where regexp_like('DOES TEST WORK HERE','\bTEST\b'); </code></pre> Running this returns zero rows. Any ideas?

The shortest regex that can check for a whole word in Oracle is <pre class="prettyprint"><code>(^|\W)test($|\W) </code></pre> See the regex demo. Details <ul> <li> <code>(^|\W)</code> - a capturing group matching either <ul> <li> <code>^</code> - start of string</li> <li> <code>|</code> - or </li> <li> <code>\W</code> - a non-word char</li> </ul> </li> <li> <code>test</code> - a word</li> <li> <code>($|\W)</code> - a capturing group matching either <ul> <li> <code>$</code> - end of string</li> <li> <code>|</code> - or </li> <li> <code>\W</code> - a non-word char.</li> </ul> </li> </ul> Note that <code>\W</code> matches any chars but letters, digits and <code>_</code>. If you want to match a word that can appear in between <code>_</code> (underscores), you need a bit different pattern: <pre class="prettyprint"><code>(^|[^[:alnum:]])test($|[^[:alnum:]]) </code></pre> The <code>[^[:alnum:]]</code> negated bracket expression matches any char but alphanumeric chars, and matches <code>_</code>, so, <code>_test_</code> will be matched with this pattern. See this regex demo.

In general, I would stick with René's solution, the exception being when you need the match to be zero-length. ie You don't want to actually capture the non-word character at the beginning/end. For example, if our string is <code>test test</code> then <code>(\b)test(\b)</code> will match twice but <code>(^|\s|\W)test($|\s|\W)</code> will only match the first occurrence. At least, that's certainly the case if you try to use regexp_substr. Example <code>SELECT regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 1, 'i'), regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 2, 'i') FROM dual;</code> Returns <code>test |NULL</code>

Oracle REGEXP_LIKE and word boundaries

Tags:

regex

oracle

word-boundary

I am having a problem with matching word boundaries with REGEXP_LIKE. The following query returns a single row, as expected.

select 1 from dual
where regexp_like('DOES TEST WORK HERE','TEST');

But I want to match on word boundaries as well. So, adding the "\b" characters gives this query

select 1 from dual
where regexp_like('DOES TEST WORK HERE','\bTEST\b');

Running this returns zero rows. Any ideas?

787

asked Sep 27 '11 10:09

Greg Reynolds

3 Answers

I believe you want to try

 select 1 from dual    where regexp_like ('does test work here', '(^|\s)test(\s|$)');

because the \b does not appear on this list: Perl-influenced Extensions in Oracle Regular Expressions

The \s makes sure that test starts and ends in a whitespace. This is not sufficient, however, since the string test could also appear at the very start or end of the string being matched. Therefore, I use the alternative (indicated by the |) ^ for start of string and $ for end of string.

Update (after 3 years+)... As it happens, I needed this functionality today, and it appears to me, that even better a regular expression is (^|\s|\W)test($|\s|\W) (The missing \b regular expression special character in Oracle).

answered Oct 01 '22 12:10

René Nyffenegger

The shortest regex that can check for a whole word in Oracle is

(^|\W)test($|\W)

See the regex demo.

Details

(^|\W) - a capturing group matching either
- ^ - start of string
- | - or
- \W - a non-word char
test - a word
($|\W) - a capturing group matching either
- $ - end of string
- | - or
- \W - a non-word char.

Note that \W matches any chars but letters, digits and _. If you want to match a word that can appear in between _ (underscores), you need a bit different pattern:

(^|[^[:alnum:]])test($|[^[:alnum:]])

The [^[:alnum:]] negated bracket expression matches any char but alphanumeric chars, and matches _, so, _test_ will be matched with this pattern.

See this regex demo.

answered Oct 01 '22 11:10

Wiktor Stribiżew

In general, I would stick with René's solution, the exception being when you need the match to be zero-length. ie You don't want to actually capture the non-word character at the beginning/end.

For example, if our string is test test then (\b)test(\b) will match twice but (^|\s|\W)test($|\s|\W) will only match the first occurrence. At least, that's certainly the case if you try to use regexp_substr.

Example

SELECT regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 1, 'i'), regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 2, 'i') FROM dual;

Returns

test |NULL

answered Oct 01 '22 10:10

ScottTracy

Related questions
                            
                                Regular expression for only characters a-z, A-Z
                            
                                Get domain name (not subdomain) in php
                            
                                Regular expression to extract URL from an HTML link
                            
                                Python regex for integer?
                            
                                How to print a file, excluding comments and blank lines, using grep/sed?
                            
                                Java Regex to Validate Full Name allow only Spaces and Letters
                            
                                Regular expression preg_quote symbols are not detected
                            
                                Undocumented Java regex character class: \p{C}
                            
                                What's different between Python and Javascript regular expressions?
                            
                                How do I filter all HTML tags except a certain whitelist?
                            
                                regexp logic and or
                            
                                Split string and get last element
                            
                                Using Java Regex, how to check if a string contains any of the words in a set ?
                            
                                What are the differences between PEGs and CFGs?
                            
                                What's a good regex to include accented characters in a simple way?
                            
                                Looping through Regex Matches
                            
                                How to replace last occurrence of characters in a string using javascript
                            
                                Python remove anything that is not a letter or number
                            
                                Invert match with regexp [duplicate]
                            
                                How can I delete all lines that do not begin with certain characters?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With