How to recognize with regex pattern including underscore in Postgres?
This is my actual regex:
[^\w]+
It matches characters fine, problem is with underscore. When I call function with that regex, for instance:
select regexp_replace('hello_world!', '[^\w]+', ' ', 'g')
I am expecting hello world. How to match underscore as well?
The problem is that \w matches underscores, too, and when you use it in a negated character class, the pattern does not match the _ chars. You may check the class-shorthand escape table:
\w[[:alnum:]_](note underscore is included)
To remove all chars other than alphanumeric you may take out the _ and use
select regexp_replace('hello_world!', '[^[:alnum:]]+', ' ', 'g')
Here, [^[:alnum:]]+ matches one or more (+) consecutive chars other than ([^...] is a negated bracket expression) letters and digits ([:alnum:] POSIX character class matches letters and digits).

Well, you might as well use (?:\W|_)+. Unfortunately, a common regex construct like [\W_]+ won't work because \W (and other negated shorthands like \S and \D are illegal inside bracket expressions. More details are available in the manual:
Within bracket expressions,
\d,\s, and\wlose their outer brackets, and\D,\S, and\Ware illegal. (So, for example,[a-c\d]is equivalent to[a-c[:digit:]]. Also,[a-c\D], which is equivalent to[a-c^[:digit:]], is illegal.)
To get rid of resulting trailing/leading spaces you may use trim:
select trim(regexp_replace('hello_world!', '[^[:alnum:]]+', ' ', 'g'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With