Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between Character.isAlphabetic and Character.isLetter in Java?

Tags:

java

unicode

What is the difference between Character.isAlphabetic() and Character.isLetter() in Java? When should one use one and when should one use the other?

like image 898
Simon Kissane Avatar asked Aug 18 '13 23:08

Simon Kissane


1 Answers

According to the API docs, isLetter() returns true if the character has any of the following general category types: UPPERCASE_LETTER (Lu), LOWERCASE_LETTER (Ll), TITLECASE_LETTER (Lt), MODIFIER_LETTER (Lm), OTHER_LETTER (Lo). If we compare isAlphabetic(), it has the same but adds LETTER_NUMBER (Nl), and also any characters having Other_Alphabetic property.

What does this mean in practice? Every letter is alphabetic, but not every alphabetic is a letter - in Java 7 (which uses Unicode 6.0.0), there are 824 characters in the BMP which are alphabetic but not letters. Some examples include 0345 (a combiner used in polytonic Greek), Hebrew vowel points (niqqud) starting at 05B0, Arabic honorifics such as saw ("peace be upon him") at 0610, Arabic vowel points... the list goes on.

But basically, for English text, the distinction makes no difference. For some other languages, the distinction might make a difference, but it is hard to predict in advance what the difference might be in practice. If one has a choice, the best answer may be isLetter() - one can always change to permit additional characters in the future, but reducing the set of accepted characters might be harder.

like image 105
Simon Kissane Avatar answered Oct 27 '22 02:10

Simon Kissane