Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgres regex issue

I need to find all records stored in postgres, which matching following regexp:

^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$

Something like this:

SELECT * FROM users WHERE users.phone ~ '^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$'

But this one falls with error:

invalid regular expression: quantifier operand invalid

Why won't Postgres work with this regex?

Using the same one in plain Ruby works just fine.

UPDATE

Problem is only with WHERE. When i try to:

SELECT '+79637434199' ~ '^((8|\+7)[\- ]?)(\(?\d{3}\)?[\- ]?)[\d\- ]{7,10}'

Postgres returns true. But when i try:

SELECT * FROM users WHERE users.phone ~ '^((8|\+7)[\- ]?)(\(?\d{3}\)?[\- ]?)[\d\- ]{7,10}'

Result: "invalid regular expression: quantifier operand invalid".

like image 551
Alexaner Tipugin Avatar asked Aug 22 '12 09:08

Alexaner Tipugin


People also ask

Does PostgreSQL support regex?

The simplest use of regex in PostgreSQL is the ~ operator, and its cousin the ~* operator. value ~ regex tests the value on the left against the regex on the right and returns true if the regex can match within the value. Note that the regex does not have to fully match the whole value, it just has to match a part.

What is regex in Postgres?

Regex stands for Regular Expressions. Regex is a sequence of characters that defines a pattern that can filter data in PostgreSQL. The TILDE (~) operator and the wildcard operator “. *” is used to implement PostgreSQL's regular expressions.

What does ~* mean in PostgreSQL?

The tilde operator returns true or false depending on whether or not a regular expression can match a string or a part thereof. ~ (Matches regular expression, case sensitive) ~* (Matches regular expression, case insensitive)

Is Postgres like case sensitive?

PostgreSQL names are case sensitive. By default, AWS Schema Conversion Tool (AWS SCT) uses object name in lowercase for PostgreSQL. In most cases, you'll want to use AWS Database Migration Service transformations to change schema, table, and column names to lower case.


1 Answers

You don't need to escape - inside a character class when you put it at the first or last position, because it cannot be misread as range that way:

[\- ][- ]
[\d\- ][\d -]

The way you have it the upper bound 10 at the end is futile.
Add $ at the end to disallow trailing characters.
Or \D to disallow trailing digits (but require a non-digit).
Or ($|\D) to either end the string there or have a non-digit follow.

Put together:

SELECT '+79637434199' ~ '^(8|\+7)[ -]?(\(?\d{3}\)?[ -]?)[\d -]{7,10}($|\D)'

Otherwise your expression is just fine and it works for me on PostgreSQL 9.1.4. It should not make any difference whatsoever whether you use it in a WHERE clause or in a SELECT list - unless you are running into a bug with some old version (like @kgrittn commented).


If I prepend the string literal with E, I can provoke the error message that you get. This cannot explain your problem, because you stated that the expression works fine as SELECT item.

But, as Sherlock Holmes is quoted, "when you have excluded the impossible, whatever remains, however improbable, must be the truth."

Maybe you ran one test with standard_conforming_strings = on and the other one with standard_conforming_strings = off - this was the default interpretation of string literals in older versions before 9.1. Maybe with two different clients (that have a different setting as to that).

Read more in the chapter String Constants with C-style Escapes in the manual.

like image 92
Erwin Brandstetter Avatar answered Oct 14 '22 04:10

Erwin Brandstetter