Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Special characters with underscore (regex in postgres)

How to recognize with regex pattern including underscore in Postgres?

This is my actual regex:

[^\w]+

It matches characters fine, problem is with underscore. When I call function with that regex, for instance:

select regexp_replace('hello_world!', '[^\w]+', ' ', 'g')

I am expecting hello world. How to match underscore as well?

like image 987
Denis Stephanov Avatar asked Jun 03 '26 09:06

Denis Stephanov


1 Answers

The problem is that \w matches underscores, too, and when you use it in a negated character class, the pattern does not match the _ chars. You may check the class-shorthand escape table:

\w   [[:alnum:]_] (note underscore is included)

To remove all chars other than alphanumeric you may take out the _ and use

select regexp_replace('hello_world!', '[^[:alnum:]]+', ' ', 'g')

Here, [^[:alnum:]]+ matches one or more (+) consecutive chars other than ([^...] is a negated bracket expression) letters and digits ([:alnum:] POSIX character class matches letters and digits).

enter image description here

Well, you might as well use (?:\W|_)+. Unfortunately, a common regex construct like [\W_]+ won't work because \W (and other negated shorthands like \S and \D are illegal inside bracket expressions. More details are available in the manual:

Within bracket expressions, \d, \s, and \w lose their outer brackets, and \D, \S, and \W are illegal. (So, for example, [a-c\d] is equivalent to [a-c[:digit:]]. Also, [a-c\D], which is equivalent to [a-c^[:digit:]], is illegal.)

To get rid of resulting trailing/leading spaces you may use trim:

select trim(regexp_replace('hello_world!', '[^[:alnum:]]+', ' ', 'g'))
like image 134
Wiktor Stribiżew Avatar answered Jun 06 '26 07:06

Wiktor Stribiżew