Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Firefox throws 'invalid identity escape in regular expression' when using Perl tokens with 'u' flag [duplicate]

What is the easiest way to match non-ASCII characters in a regex? I would like to match all words individually in an input string, but the language may not be English, so I will need to match things like ü, ö, ß, and ñ. Also, this is in Javascript/jQuery, so any solution will need to apply to that.

like image 510
Paul Wicks Avatar asked Sep 29 '08 18:09

Paul Wicks


1 Answers

This should do it:

[^\x00-\x7F]+

It matches any character which is not contained in the ASCII character set (0-127, i.e. 0x0 to 0x7F).

You can do the same thing with Unicode:

[^\u0000-\u007F]+

For unicode you can look at this 2 resources:

  • Code charts list of Unicode ranges
  • This tool to create a regex filtered by Unicode block.
like image 146
Paige Ruten Avatar answered Oct 26 '22 02:10

Paige Ruten