Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java and Android regex difference

Good day! I have a regex pattern :

 Pattern p = Pattern.compile("^[a-zA-Z_\\$][\\w\\$]*(?:\\.[a-zA-Z_\\$][\\w\\$]*)*$");

It should tell me if java / android package name is legal or not. It works fine on desktop java, but it failures on android devices

Lets say I have some package names :

 ". .", "ПАвыапЫВАПыва", "com.mxtech.ffmpeg.v7_neon", ...

Test should show that the only valid package is "com.mxtech.ffmpeg.v7_neon", but is also shows that test string

" _ПАвыапЫВАПыва\_ "

is valid. Why? (It is Cyrillic. )

What is the difference between Android and Desktop realizations?

like image 780
RedCollarPanda Avatar asked Mar 02 '16 15:03

RedCollarPanda


1 Answers

The issue is caused by the fact that \w in Android regex is Unicode aware.

Replace with [A-Za-z0-9_] to only match ASCII letters, digits and an underscore.

See the Android Pattern reference:

Note that these built-in classes don't just cover the traditional ASCII range. For example, \w is equivalent to the character class [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}]. If you actually want to match only ASCII characters, specify the explicit characters you want.

like image 185
Wiktor Stribiżew Avatar answered Sep 23 '22 01:09

Wiktor Stribiżew