Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check that string contains non-latin letters

Tags:

java

regex

latin

I have the following method to check that string contains only latin symbols.

private boolean containsNonLatin(String val) {
        return val.matches("\\w+");
}

But it returns false if I pass string: my string because it contains space. But I need the method which will check that if string contains letters not in Latin alphabet it should return false and it should return true in all other cases.

Please help to improve my method.

examples of valid strings:

w123.
w, 12
w#123
dsf%&@
like image 382
gstackoverflow Avatar asked Dec 11 '22 18:12

gstackoverflow


2 Answers

You can use \p{IsLatin} class:

return !(var.matches("[\\p{Punct}\\p{Space}\\p{IsLatin}]+$"));

Java Regex Reference

like image 200
anubhava Avatar answered Dec 22 '22 08:12

anubhava


I need something like not p{IsLatin}

If you need to match all letters but Latin ASCII letters, you can use

"[\\p{L}\\p{M}&&[^\\p{Alpha}]]+"

The \p{Alpha} POSIX class matches [A-Za-z]. The \p{L} matches any Unicode base letter, \p{M} matches diacritics. When we add &&[^\p{Alpha}] we subtract these [A-Za-z] from all the Unicode letters.

The whole expression means match one or more Unicode letters other than ASCII letters.

To add a space, just add \s:

"[\\s\\p{L}\\p{M}&&[^\\p{Alpha}]]+"

See IDEONE demo:

List<String> strs = Arrays.asList("w123.", "w, 12", "w#123", "dsf%&@", "Двв");
for (String str : strs)
    System.out.println(!str.matches("[\\s\\p{L}\\p{M}&&[^\\p{Alpha}]]+")); // => 4 true, 1 false
like image 45
Wiktor Stribiżew Avatar answered Dec 22 '22 08:12

Wiktor Stribiżew