Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Not REGEXP_LIKE in Oracle

Tags:

regex

sql

oracle

I have a large table with phone numbers. The phone numbers are all strings and supposed to be '+9628789878' or similar. (a "+" sign followed by between 9 and 13 digits.)

A user bug uncovered one row with the string '+987+9873678298'. Clearly it shouldn't be there and I'd like to find out how many other cases there are of this or other such errors.

I tried this query but it's not doing the job. My thinking is anything that's not like this string. (Oh, the table is not indexed by phone_number.)

SELECT user_key,
       first_name,
       last_name,
       phone_number
FROM   users u
WHERE  regexp_like(phone_number, '[^\+[0-9]*]')
AND    phone_number IS NOT NULL
like image 407
SAR622 Avatar asked Mar 01 '17 15:03

SAR622


People also ask

What is REGEXP_LIKE in Oracle?

REGEXP_LIKE is similar to the LIKE condition, except REGEXP_LIKE performs regular expression matching instead of the simple pattern matching performed by LIKE . This condition evaluates strings using characters as defined by the input character set.

Can we use REGEXP_LIKE in select statement?

The Oracle REGEXP_LIKE condition allows you to perform regular expression matching in the WHERE clause of a SELECT, INSERT, UPDATE, or DELETE statement.

Is REGEXP_LIKE faster than like?

Also LIKE is much faster than REGEXP.

What is the syntax of REGEXP_LIKE condition in Oracle Plsql?

syntax. string expression – the string expression. match parameter – lets you to change the default matching behaviour of the Oracle REGEXP_LIKE function (for example, change the search from case sensitive to case insensitive).


1 Answers

If you need to find all the rows where phone_number is not made by exactly a '+' followed by 9-13 digits, this should do the work:

select *
from users 
where not regexp_like(phone_number, '^\+[0-9]{9,13}$')

What it does:

  • ^ the beginning of the string, to avoid things like 'XX +123456789'
  • \+ the '+'
  • [0-9]{9,13} a sequence of 9-13 digits
  • $ the end of the string, to avoid strings like '+123456789 XX'

Another way, with no regexp, could be the following:

where not (
                /* strings of 10-14 chars */
                length(phone_number) between 10 and 14 
                /* ... whose first is a + */
            and substr(phone_number, 1, 1 ) = '+' 
                /* ...and that become a '+' after removing all the digits */
            and nvl(translate(phone_number, 'X0123456789', 'X'), '+') = '+' 
          )

This could be faster than the regexp approach, even if it's based on more conditions, but I believe only a test will tell you which one is the best performing.

like image 71
Aleksej Avatar answered Sep 23 '22 06:09

Aleksej