Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a java regular expression which matches a word in any language

Tags:

java

regex

To match a word in english I would use pattern [a-zA-Z]+.

Is there any way how to write a regular expression which will match a word in any language? That is even if the word contains characters like ščžé.... I have no idea what possible characters exist in the world so I don't think that pure [a-zA-Zščžé]+ would be enough...

Is there a better way to write this expression?

like image 872
Palo Avatar asked Dec 15 '10 10:12

Palo


1 Answers

According to the Pattern javadoc, \p{L}+ should match a sequence of Unicode letters (i.e. characters that have the category L in Unicode). That's probably the widest possible definition though you may want to look at the unicode categories list to decide whether you want to add other categories (e.g. there is one called "Number Letter").

like image 99
Michael Borgwardt Avatar answered Sep 26 '22 05:09

Michael Borgwardt