I am having a problem with matching word boundaries with REGEXP_LIKE. The following query returns a single row, as expected.
select 1 from dual
where regexp_like('DOES TEST WORK HERE','TEST');
But I want to match on word boundaries as well. So, adding the "\b" characters gives this query
select 1 from dual
where regexp_like('DOES TEST WORK HERE','\bTEST\b');
Running this returns zero rows. Any ideas?
\m matches only at the start of a word. That is, it matches at any position that has a non-word character to the left of it, and a word character to the right of it. It also matches at the start of the string if the first character in the string is a word character. \M matches only at the end of a word.
REGEXP_LIKE is similar to the LIKE condition, except REGEXP_LIKE performs regular expression matching instead of the simple pattern matching performed by LIKE . This condition evaluates strings using characters as defined by the input character set.
The Oracle REGEXP_LIKE condition allows you to perform regular expression matching in the WHERE clause of a SELECT, INSERT, UPDATE, or DELETE statement.
I believe you want to try
select 1 from dual where regexp_like ('does test work here', '(^|\s)test(\s|$)');
because the \b
does not appear on this list: Perl-influenced Extensions in Oracle Regular Expressions
The \s
makes sure that test starts and ends in a whitespace. This is not sufficient, however, since the string test
could also appear at the very start or end of the string being matched. Therefore, I use the alternative (indicated by the |
) ^
for start of string and $
for end of string.
Update (after 3 years+)... As it happens, I needed this functionality today, and it appears to me, that even better a regular expression is (^|\s|\W)test($|\s|\W)
(The missing \b regular expression special character in Oracle).
The shortest regex that can check for a whole word in Oracle is
(^|\W)test($|\W)
See the regex demo.
Details
(^|\W)
- a capturing group matching either ^
- start of string|
- or \W
- a non-word chartest
- a word($|\W)
- a capturing group matching either $
- end of string|
- or \W
- a non-word char.Note that \W
matches any chars but letters, digits and _
. If you want to match a word that can appear in between _
(underscores), you need a bit different pattern:
(^|[^[:alnum:]])test($|[^[:alnum:]])
The [^[:alnum:]]
negated bracket expression matches any char but alphanumeric chars, and matches _
, so, _test_
will be matched with this pattern.
See this regex demo.
In general, I would stick with René's solution, the exception being when you need the match to be zero-length. ie You don't want to actually capture the non-word character at the beginning/end.
For example, if our string is test test
then (\b)test(\b)
will match twice but (^|\s|\W)test($|\s|\W)
will only match the first occurrence. At least, that's certainly the case if you try to use regexp_substr.
Example
SELECT regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 1, 'i'),
regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 2, 'i') FROM dual;
Returns
test |NULL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With