Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to match non-ASCII characters?

What is the easiest way to match non-ASCII characters in a regex? I would like to match all words individually in an input string, but the language may not be English, so I will need to match things like ü, ö, ß, and ñ. Also, this is in Javascript/jQuery, so any solution will need to apply to that.

like image 708
Paul Wicks Avatar asked Sep 29 '08 18:09

Paul Wicks


People also ask

How do I find a non-ASCII character in SQL?

Alternatively, you can also use regular expressions to find non-ASCII characters. ASCII character set is captured using regex [A-Za-z0-9]. You can use this regex in your query as shown below, to find non-ASCII characters. mysql> SELECT * FROM data WHERE full_name NOT REGEXP '[A-Za-z0-9]';

How do I match a character in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Does regex work with Unicode?

This will make your regular expressions work with all Unicode regex engines. In addition to the standard notation, \p{L}, Java, Perl, PCRE, the JGsoft engine, and XRegExp 3 allow you to use the shorthand \pL. The shorthand only works with single-letter Unicode properties.


1 Answers

This should do it:

[^\x00-\x7F]+ 

It matches any character which is not contained in the ASCII character set (0-127, i.e. 0x0 to 0x7F).

You can do the same thing with Unicode:

[^\u0000-\u007F]+ 

For unicode you can look at this 2 resources:

  • Code charts list of Unicode ranges
  • This tool to create a regex filtered by Unicode block.
like image 110
Paige Ruten Avatar answered Sep 28 '22 03:09

Paige Ruten