How do you check if a one-character String is a letter - including any letters with accents?
I had to work this out recently, so I'll answer it myself, after the recent VB6 question reminded me.
We can check whether the given character in a string is a number/letter by using isDigit() method of Character class. The isDigit() method is a static method and determines if the specified character is a digit.
Compare the lowercase and uppercase variants of the character to check if it is a letter, e.g. char. toLowerCase() !== char.
isLetterOrDigit(char ch) determines if the specified character is a letter or digit. A character is considered to be a letter or digit if either Character. isLetter(char ch) or Character. isDigit(char ch) returns true for the character.
You can search for a particular letter in a string using the indexOf() method of the String class. This method which returns a position index of a word within the string if found. Otherwise it returns -1.
Character.isLetter() is much faster than string.matches(), because string.matches() compiles a new Pattern every time. Even caching the pattern, I think isLetter() would still beat it.
EDIT: Just ran across this again and thought I'd try to come up with some actual numbers. Here's my attempt at a benchmark, checking all three methods (matches()
with and without caching the Pattern
, and Character.isLetter()
). I also made sure that there were both valid and invalid characters checked, so as not to skew things.
import java.util.regex.*; class TestLetter { private static final Pattern ONE_CHAR_PATTERN = Pattern.compile("\\p{L}"); private static final int NUM_TESTS = 10000000; public static void main(String[] args) { long start = System.nanoTime(); int counter = 0; for (int i = 0; i < NUM_TESTS; i++) { if (testMatches(Character.toString((char) (i % 128)))) counter++; } System.out.println(NUM_TESTS + " tests of Pattern.matches() took " + (System.nanoTime()-start) + " ns."); System.out.println("There were " + counter + "/" + NUM_TESTS + " valid characters"); /*********************************/ start = System.nanoTime(); counter = 0; for (int i = 0; i < NUM_TESTS; i++) { if (testCharacter(Character.toString((char) (i % 128)))) counter++; } System.out.println(NUM_TESTS + " tests of isLetter() took " + (System.nanoTime()-start) + " ns."); System.out.println("There were " + counter + "/" + NUM_TESTS + " valid characters"); /*********************************/ start = System.nanoTime(); counter = 0; for (int i = 0; i < NUM_TESTS; i++) { if (testMatchesNoCache(Character.toString((char) (i % 128)))) counter++; } System.out.println(NUM_TESTS + " tests of String.matches() took " + (System.nanoTime()-start) + " ns."); System.out.println("There were " + counter + "/" + NUM_TESTS + " valid characters"); } private static boolean testMatches(final String c) { return ONE_CHAR_PATTERN.matcher(c).matches(); } private static boolean testMatchesNoCache(final String c) { return c.matches("\\p{L}"); } private static boolean testCharacter(final String c) { return Character.isLetter(c.charAt(0)); } }
And my output:
10000000 tests of Pattern.matches() took 4325146672 ns. There were 4062500/10000000 valid characters 10000000 tests of isLetter() took 546031201 ns. There were 4062500/10000000 valid characters 10000000 tests of String.matches() took 11900205444 ns. There were 4062500/10000000 valid characters
So that's almost 8x better, even with a cached Pattern
. (And uncached is nearly 3x worse than cached.)
Just checking if a letter is in A-Z because that doesn't include letters with accents or letters in other alphabets.
I found out that you can use the regular expression class for 'Unicode letter', or one of its case-sensitive variations:
string.matches("\\p{L}"); // Unicode letter string.matches("\\p{Lu}"); // Unicode upper-case letter
You can also do this with Character class:
Character.isLetter(character);
but that is less convenient if you need to check more than one letter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With