To forgo reading the entire problem, my basic question is:
Is there a function in PostgreSQL to escape regular expression characters in a string?
I've probed the documentation but was unable to find such a function.
Here is the full problem:
In a PostgreSQL database, I have a column with unique names in it. I also have a process which periodically inserts names into this field, and, to prevent duplicates, if it needs to enter a name that already exists, it appends a space and parentheses with a count to the end.
i.e. Name, Name (1), Name (2), Name (3), etc.
As it stands, I use the following code to find the next number to add in the series (written in plpgsql):
var_name_id := 1;
SELECT CAST(substring(a.name from E'\\((\\d+)\\)$') AS int)
INTO var_last_name_id
FROM my_table.names a
WHERE a.name LIKE var_name || ' (%)'
ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) DESC
LIMIT 1;
IF var_last_name_id IS NOT NULL THEN
var_name_id = var_last_name_id + 1;
END IF;
var_new_name := var_name || ' (' || var_name_id || ')';
(var_name
contains the name I'm trying to insert.)
This works for now, but the problem lies in the WHERE
statement:
WHERE a.name LIKE var_name || ' (%)'
This check doesn't verify that the %
in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.
The WHERE
statement really needs to be something more like:
WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'
But var_name
could contain regular expression characters, which leads to the question above: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:
WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'
Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.
The \ is known as the escape code, which restore the original literal meaning of the following character. Similarly, * , + , ? (occurrence indicators), ^ , $ (position anchors) have special meaning in regex. You need to use an escape code to match with these characters.
In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped. Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.
Now, escaping a string (in regex terms) means finding all of the characters with special meaning and putting a backslash in front of them, including in front of other backslash characters. When you've done this one time on the string, you have officially "escaped the string".
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
To address the question at the top:
Assuming standard_conforming_strings = on
, like it's default since Postgres 9.1.
Let's start with a complete list of characters with special meaning in regular expression patterns:
!$()*+.:<=>?[\]^{|}-
Wrapped in a bracket expression most of them lose their special meaning - with a few exceptions:
-
needs to be first or last or it signifies a range of characters.]
and \
have to be escaped with \
(in the replacement, too).After adding capturing parentheses for the back reference below we get this regexp pattern:
([!$()*+.:<=>?[\\\]^{|}-])
Using it, this function escapes all special characters with a backslash (\
) - thereby removing the special meaning:
CREATE OR REPLACE FUNCTION f_regexp_escape(text)
RETURNS text
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g')
$func$;
Add PARALLEL SAFE
(because it is) in Postgres 10 or later to allow parallelism for queries using it.
SELECT f_regexp_escape('test(1) > Foo*');
Returns:
test\(1\) \> Foo\*
And while:
SELECT 'test(1) > Foo*' ~ 'test(1) > Foo*';
returns FALSE
, which may come as a surprise to naive users,
SELECT 'test(1) > Foo*' ~ f_regexp_escape('test(1) > Foo*');
Returns TRUE
as it should now.
LIKE
escape functionFor completeness, the pendant for LIKE
patterns, where only three characters are special:
\%_
The manual:
The default escape character is the backslash but a different one can be selected by using the
ESCAPE
clause.
This function assumes the default:
CREATE OR REPLACE FUNCTION f_like_escape(text)
RETURNS text
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
$func$
SELECT replace(replace(replace($1
, '\', '\\') -- must come 1st
, '%', '\%')
, '_', '\_');
$func$;
We could use the more elegant regexp_replace()
here, too, but for the few characters, a cascade of replace()
functions is faster.
Again, PARALLEL SAFE
in Postgres 10 or later.
SELECT f_like_escape('20% \ 50% low_prices');
Returns:
20\% \\ 50\% low\_prices
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With