Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

internationalized regular expression in postgresql

How can write regular expressions to match names like 'José' in postgres.. In other words I need to setup a constraint to check that only valid names are entered, but want to allow unicode characters also.

Regular expressions, unicode style have some reference on this. But, it seems I can't write it in postgres.

If it is not possible to write a regex for this, will it be sufficient to check only on client side using javascript

like image 835
robert Avatar asked Sep 29 '10 08:09

robert


People also ask

Can we use regular expression in PostgreSQL?

The Regular Expressions in PostgreSQL are implemented using the TILDE (~) operator and uses '. *” as a wildcard operator. As you can see in the figure above, we have used Regular Expression in PostgreSQL using the TILDE (~) operator and the wildcard '.

What is regex in Postgres?

Regex stands for Regular Expressions. Regex is a sequence of characters that defines a pattern that can filter data in PostgreSQL. The TILDE (~) operator and the wildcard operator “. *” is used to implement PostgreSQL's regular expressions.

What does ~* mean in PostgreSQL?

The tilde operator returns true or false depending on whether or not a regular expression can match a string or a part thereof. ~ (Matches regular expression, case sensitive) ~* (Matches regular expression, case insensitive)

How do I escape a string in PostgreSQL?

PostgreSQL also accepts “escape” string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter E (upper or lower case) just before the opening single quote, e.g., E'foo' .


1 Answers

PostgreSQL doesn't support character classes based on the Unicode Character Database like .NET does. You get the more-standard [[:alpha:]] character class, but this is locale-dependent and probably won't cover it.

You may be able to get away with just blacklisting the ASCII characters you don't want, and allowing all non-ASCII characters. eg something like

[^\s!"#$%&'()*+,\-./:;<=>?\[\\\]^_`~]+

(JavaScript doesn't have non-ASCII character classes either. Or even [[:alpha:]].)

For example, given v_text as a text variable to be sanitzed:

-- Allow internationalized text characters and remove undesired characters
v_text = regexp_replace( lower(trim(v_text)), '[!"#$%&()*+,./:;<=>?\[\\\]\^_\|~]+', '', 'g' );
like image 91
bobince Avatar answered Oct 04 '22 00:10

bobince