I need to find all records stored in postgres, which matching following regexp:
^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$
Something like this:
SELECT * FROM users WHERE users.phone ~ '^((8|\+7)[\- ]?)?(\(?\d{3}\)?[\- ]?)?[\d\- ]{7,10}$'
But this one falls with error:
invalid regular expression: quantifier operand invalid
Why won't Postgres work with this regex?
Using the same one in plain Ruby works just fine.
UPDATE
Problem is only with WHERE. When i try to:
SELECT '+79637434199' ~ '^((8|\+7)[\- ]?)(\(?\d{3}\)?[\- ]?)[\d\- ]{7,10}'
Postgres returns true. But when i try:
SELECT * FROM users WHERE users.phone ~ '^((8|\+7)[\- ]?)(\(?\d{3}\)?[\- ]?)[\d\- ]{7,10}'
Result: "invalid regular expression: quantifier operand invalid".
The simplest use of regex in PostgreSQL is the ~ operator, and its cousin the ~* operator. value ~ regex tests the value on the left against the regex on the right and returns true if the regex can match within the value. Note that the regex does not have to fully match the whole value, it just has to match a part.
Regex stands for Regular Expressions. Regex is a sequence of characters that defines a pattern that can filter data in PostgreSQL. The TILDE (~) operator and the wildcard operator “. *” is used to implement PostgreSQL's regular expressions.
The tilde operator returns true or false depending on whether or not a regular expression can match a string or a part thereof. ~ (Matches regular expression, case sensitive) ~* (Matches regular expression, case insensitive)
PostgreSQL names are case sensitive. By default, AWS Schema Conversion Tool (AWS SCT) uses object name in lowercase for PostgreSQL. In most cases, you'll want to use AWS Database Migration Service transformations to change schema, table, and column names to lower case.
You don't need to escape -
inside a character class when you put it at the first or last position, because it cannot be misread as range that way:
[\- ]
→ [- ]
[\d\- ]
→ [\d -]
The way you have it the upper bound 10
at the end is futile.
Add $
at the end to disallow trailing characters.
Or \D
to disallow trailing digits (but require a non-digit).
Or ($|\D)
to either end the string there or have a non-digit follow.
Put together:
SELECT '+79637434199' ~ '^(8|\+7)[ -]?(\(?\d{3}\)?[ -]?)[\d -]{7,10}($|\D)'
Otherwise your expression is just fine and it works for me on PostgreSQL 9.1.4. It should not make any difference whatsoever whether you use it in a WHERE
clause or in a SELECT
list - unless you are running into a bug with some old version (like @kgrittn commented).
If I prepend the string literal with E
, I can provoke the error message that you get. This cannot explain your problem, because you stated that the expression works fine as SELECT
item.
But, as Sherlock Holmes is quoted, "when you have excluded the impossible, whatever remains, however improbable, must be the truth."
Maybe you ran one test with standard_conforming_strings = on
and the other one with standard_conforming_strings = off
- this was the default interpretation of string literals in older versions before 9.1. Maybe with two different clients (that have a different setting as to that).
Read more in the chapter String Constants with C-style Escapes in the manual.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With