Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine whether a character is a letter in Java?

Tags:

java

unicode

How do you check if a one-character String is a letter - including any letters with accents?

I had to work this out recently, so I'll answer it myself, after the recent VB6 question reminded me.

like image 883
Peter Hilton Avatar asked Sep 18 '08 16:09

Peter Hilton


People also ask

How do you determine if a character is a letter in Java?

We can check whether the given character in a string is a number/letter by using isDigit() method of Character class. The isDigit() method is a static method and determines if the specified character is a digit.

How do you check if a char is a letter?

Compare the lowercase and uppercase variants of the character to check if it is a letter, e.g. char. toLowerCase() !== char.

Is Java a digit or a letter?

isLetterOrDigit(char ch) determines if the specified character is a letter or digit. A character is considered to be a letter or digit if either Character. isLetter(char ch) or Character. isDigit(char ch) returns true for the character.

How do you search for a letter in Java?

You can search for a particular letter in a string using the indexOf() method of the String class. This method which returns a position index of a word within the string if found. Otherwise it returns -1.


2 Answers

Character.isLetter() is much faster than string.matches(), because string.matches() compiles a new Pattern every time. Even caching the pattern, I think isLetter() would still beat it.


EDIT: Just ran across this again and thought I'd try to come up with some actual numbers. Here's my attempt at a benchmark, checking all three methods (matches() with and without caching the Pattern, and Character.isLetter()). I also made sure that there were both valid and invalid characters checked, so as not to skew things.

import java.util.regex.*;  class TestLetter {     private static final Pattern ONE_CHAR_PATTERN = Pattern.compile("\\p{L}");     private static final int NUM_TESTS = 10000000;      public static void main(String[] args) {         long start = System.nanoTime();         int counter = 0;         for (int i = 0; i < NUM_TESTS; i++) {             if (testMatches(Character.toString((char) (i % 128))))                 counter++;         }         System.out.println(NUM_TESTS + " tests of Pattern.matches() took " +                 (System.nanoTime()-start) + " ns.");         System.out.println("There were " + counter + "/" + NUM_TESTS +                 " valid characters");         /*********************************/         start = System.nanoTime();         counter = 0;         for (int i = 0; i < NUM_TESTS; i++) {             if (testCharacter(Character.toString((char) (i % 128))))                 counter++;         }         System.out.println(NUM_TESTS + " tests of isLetter() took " +                 (System.nanoTime()-start) + " ns.");         System.out.println("There were " + counter + "/" + NUM_TESTS +                 " valid characters");         /*********************************/         start = System.nanoTime();         counter = 0;         for (int i = 0; i < NUM_TESTS; i++) {             if (testMatchesNoCache(Character.toString((char) (i % 128))))                 counter++;         }         System.out.println(NUM_TESTS + " tests of String.matches() took " +                 (System.nanoTime()-start) + " ns.");         System.out.println("There were " + counter + "/" + NUM_TESTS +                 " valid characters");     }      private static boolean testMatches(final String c) {         return ONE_CHAR_PATTERN.matcher(c).matches();     }     private static boolean testMatchesNoCache(final String c) {         return c.matches("\\p{L}");     }     private static boolean testCharacter(final String c) {         return Character.isLetter(c.charAt(0));     } } 

And my output:

10000000 tests of Pattern.matches() took 4325146672 ns. There were 4062500/10000000 valid characters 10000000 tests of isLetter() took 546031201 ns. There were 4062500/10000000 valid characters 10000000 tests of String.matches() took 11900205444 ns. There were 4062500/10000000 valid characters

So that's almost 8x better, even with a cached Pattern. (And uncached is nearly 3x worse than cached.)

like image 135
Michael Myers Avatar answered Oct 11 '22 14:10

Michael Myers


Just checking if a letter is in A-Z because that doesn't include letters with accents or letters in other alphabets.

I found out that you can use the regular expression class for 'Unicode letter', or one of its case-sensitive variations:

string.matches("\\p{L}"); // Unicode letter string.matches("\\p{Lu}"); // Unicode upper-case letter 

You can also do this with Character class:

Character.isLetter(character); 

but that is less convenient if you need to check more than one letter.

like image 43
Peter Hilton Avatar answered Oct 11 '22 14:10

Peter Hilton