Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle REGEXP_LIKE and word boundaries

I am having a problem with matching word boundaries with REGEXP_LIKE. The following query returns a single row, as expected.

select 1 from dual
where regexp_like('DOES TEST WORK HERE','TEST');

But I want to match on word boundaries as well. So, adding the "\b" characters gives this query

select 1 from dual
where regexp_like('DOES TEST WORK HERE','\bTEST\b');

Running this returns zero rows. Any ideas?

like image 787
Greg Reynolds Avatar asked Sep 27 '11 10:09

Greg Reynolds


People also ask

How do you match a word boundary in regex?

\m matches only at the start of a word. That is, it matches at any position that has a non-word character to the left of it, and a word character to the right of it. It also matches at the start of the string if the first character in the string is a word character. \M matches only at the end of a word.

What is REGEXP_LIKE in Oracle?

REGEXP_LIKE is similar to the LIKE condition, except REGEXP_LIKE performs regular expression matching instead of the simple pattern matching performed by LIKE . This condition evaluates strings using characters as defined by the input character set.

Can we use REGEXP_LIKE in select statement?

The Oracle REGEXP_LIKE condition allows you to perform regular expression matching in the WHERE clause of a SELECT, INSERT, UPDATE, or DELETE statement.


3 Answers

I believe you want to try

 select 1 from dual    where regexp_like ('does test work here', '(^|\s)test(\s|$)'); 

because the \b does not appear on this list: Perl-influenced Extensions in Oracle Regular Expressions

The \s makes sure that test starts and ends in a whitespace. This is not sufficient, however, since the string test could also appear at the very start or end of the string being matched. Therefore, I use the alternative (indicated by the |) ^ for start of string and $ for end of string.

Update (after 3 years+)... As it happens, I needed this functionality today, and it appears to me, that even better a regular expression is (^|\s|\W)test($|\s|\W) (The missing \b regular expression special character in Oracle).

like image 53
René Nyffenegger Avatar answered Oct 01 '22 12:10

René Nyffenegger


The shortest regex that can check for a whole word in Oracle is

(^|\W)test($|\W) 

See the regex demo.

Details

  • (^|\W) - a capturing group matching either
    • ^ - start of string
    • | - or
    • \W - a non-word char
  • test - a word
  • ($|\W) - a capturing group matching either
    • $ - end of string
    • | - or
    • \W - a non-word char.

Note that \W matches any chars but letters, digits and _. If you want to match a word that can appear in between _ (underscores), you need a bit different pattern:

(^|[^[:alnum:]])test($|[^[:alnum:]]) 

The [^[:alnum:]] negated bracket expression matches any char but alphanumeric chars, and matches _, so, _test_ will be matched with this pattern.

See this regex demo.

like image 43
Wiktor Stribiżew Avatar answered Oct 01 '22 11:10

Wiktor Stribiżew


In general, I would stick with René's solution, the exception being when you need the match to be zero-length. ie You don't want to actually capture the non-word character at the beginning/end.

For example, if our string is test test then (\b)test(\b) will match twice but (^|\s|\W)test($|\s|\W) will only match the first occurrence. At least, that's certainly the case if you try to use regexp_substr.

Example

SELECT regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 1, 'i'), regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 2, 'i') FROM dual;

Returns

test |NULL

like image 21
ScottTracy Avatar answered Oct 01 '22 10:10

ScottTracy