Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching Unicode letters with RegExp

Tags:

dart

I am in need of matching Unicode letters, similarly to PCRE's \p{L}.

Now, since Dart's RegExp class is based on ECMAScript's, it doesn't have the concept of \p{L}, sadly.

I'm looking into perhaps constructing a big character class that matches all Unicode letters, but I'm not sure where to start.

So, I want to match letters like:

foobar
מכון ראות

But the R symbol shouldn't be matched:

BlackBerry®

Neither should any ASCII control characters or punctuation marks, etc. Essentially every letter in every language Unicode supports, whether it's å, ä, φ or ת, they should match if they are actual letters.

like image 546
Kai Sellgren Avatar asked Mar 20 '13 18:03

Kai Sellgren


People also ask

How do I match a character in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What is the regex for Unicode paragraph seperator?

\u000d — Carriage return — \r. \u2028 — Line separator. \u2029 — Paragraph separator.

What is \pL in regex?

\\pL is a Unicode property shortcut. It can also be written as as \p{L} or \p{Letter} . It matches any kind of letter from any language.

Does JavaScript support Unicode regex?

As mentioned in other answers, JavaScript regexes have no support for Unicode character classes.


1 Answers

I know this is an old question. But RegExp now supports unicode categories (since Dart 2.4) so you can do something like this:

RegExp alpha = RegExp(r'\p{Letter}', unicode: true);
print(alpha.hasMatch("f")); // true
print(alpha.hasMatch("ת")); // true
print(alpha.hasMatch("®")); // false
like image 192
J. V. Avatar answered Oct 01 '22 14:10

J. V.