Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression for UpperCase Letters In A String

Tags:

java

regex

For the life of me, I can't figure out why this regular expression is not working. It should find upper case letters in the given string and give me the count. Any ideas are welcome.

Here is the unit test code:

public class RegEx {

    @Test
    public void testCountTheNumberOfUpperCaseCharacters() {
        String testStr = "abcdefghijkTYYtyyQ";
        String regEx = "^[A-Z]+$";

        Pattern pattern = Pattern.compile(regEx);

        Matcher matcher = pattern.matcher(testStr);

        System.out.printf("Found %d, of capital letters in %s%n", matcher.groupCount(), testStr);

    }
}
like image 651
David Avatar asked Dec 18 '13 15:12

David


5 Answers

It doesn't work because you have 2 problems:

  1. Regex is incorrect, it should be "[A-Z]" for ASCII letter or \p{Lu} for Unicode uppercase letters
  2. You're not calling while (matcher.find()) before matcher.groupCount()

Correct code:

public void testCountTheNumberOfUpperCaseCharacters() {
    String testStr = "abcdefghijkTYYtyyQ";
    String regEx = "(\\p{Lu})";
    Pattern pattern = Pattern.compile(regEx);
    Matcher matcher = pattern.matcher(testStr);
    while (matcher.find())
        System.out.printf("Found %d, of capital letters in %s%n", 
          matcher.groupCount(), testStr);

}

UPDATE: Use this much simpler one-liner code to count number of Unicode upper case letters in a string:

int countuc = testStr.split("(?=\\p{Lu})").length - 1;
like image 118
anubhava Avatar answered Nov 18 '22 18:11

anubhava


  1. You didn't call matches or find on the matcher. It hasn't done any work.

  2. getGroupCount is the wrong method to call. Your regex has no capture groups, and even if it did, it wouldn't give you the character count.

You should be using find, but with a different regex, one without anchors. I would also advise using the proper Unicode character class: "\\p{Lu}+". Use this in a while (m.find()) loop, and accumulate the total number of characters obtained from m.group(0).length() at each step.

like image 45
Marko Topolnik Avatar answered Nov 18 '22 17:11

Marko Topolnik


This should do what you're after,

@Test
public void testCountTheNumberOfUpperCaseCharacters() {
  String testStr = "abcdefghijkTYYtyyQ";
  String regEx = "[A-Z]+";
  Pattern pattern = Pattern.compile(regEx);
  Matcher matcher = pattern.matcher(testStr);
  int count = 0;
  while (matcher.find()) {
    count+=matcher.group(0).length();
  }
  System.out.printf("Found %d, of capital letters in %s%n", count, testStr);
}
like image 6
M21B8 Avatar answered Nov 18 '22 18:11

M21B8


It should find upper case letters in the given string and give me the count.

No, it shouldn't: the ^ and $ anchors prevent it from doing so, forcing to look for a non-empty string composed entirely of uppercase characters.

Moreover, you cannot expect a group count in an expression that does not define groups to be anything other than zero (no matches) or one (a single match).

If you insist on using a regex, use a simple [A-Z] expression with no anchors, and call matcher.find() in a loop. A better approach, however, would be calling Character.isUpperCase on the characters of your string, and counting the hits:

int count = 0;
for (char c : str.toCharArray()) {
    if (Character.isUpperCase(c)) {
        count++;
    }
}
like image 3
Sergey Kalinichenko Avatar answered Nov 18 '22 19:11

Sergey Kalinichenko


Your pattern as you've written it looks for 1 or more capital letters between the beginning and the end of the line...if there are any lowercase characters in the line it won't match.

like image 1
James Gawron Avatar answered Nov 18 '22 19:11

James Gawron