For the life of me, I can't figure out why this regular expression is not working. It should find upper case letters in the given string and give me the count. Any ideas are welcome.
Here is the unit test code:
public class RegEx {
@Test
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "^[A-Z]+$";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
System.out.printf("Found %d, of capital letters in %s%n", matcher.groupCount(), testStr);
}
}
It doesn't work because you have 2 problems:
"[A-Z]"
for ASCII letter or \p{Lu}
for Unicode uppercase letterswhile (matcher.find())
before matcher.groupCount()
Correct code:
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "(\\p{Lu})";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
while (matcher.find())
System.out.printf("Found %d, of capital letters in %s%n",
matcher.groupCount(), testStr);
}
UPDATE: Use this much simpler one-liner code to count number of Unicode upper case letters in a string:
int countuc = testStr.split("(?=\\p{Lu})").length - 1;
You didn't call matches
or find
on the matcher. It hasn't done any work.
getGroupCount
is the wrong method to call. Your regex has no capture groups, and even if it did, it wouldn't give you the character count.
You should be using find
, but with a different regex, one without anchors. I would also advise using the proper Unicode character class: "\\p{Lu}+"
. Use this in a while (m.find())
loop, and accumulate the total number of characters obtained from m.group(0).length()
at each step.
This should do what you're after,
@Test
public void testCountTheNumberOfUpperCaseCharacters() {
String testStr = "abcdefghijkTYYtyyQ";
String regEx = "[A-Z]+";
Pattern pattern = Pattern.compile(regEx);
Matcher matcher = pattern.matcher(testStr);
int count = 0;
while (matcher.find()) {
count+=matcher.group(0).length();
}
System.out.printf("Found %d, of capital letters in %s%n", count, testStr);
}
It should find upper case letters in the given string and give me the count.
No, it shouldn't: the ^
and $
anchors prevent it from doing so, forcing to look for a non-empty string composed entirely of uppercase characters.
Moreover, you cannot expect a group count in an expression that does not define groups to be anything other than zero (no matches) or one (a single match).
If you insist on using a regex, use a simple [A-Z]
expression with no anchors, and call matcher.find()
in a loop. A better approach, however, would be calling Character.isUpperCase
on the characters of your string, and counting the hits:
int count = 0;
for (char c : str.toCharArray()) {
if (Character.isUpperCase(c)) {
count++;
}
}
Your pattern as you've written it looks for 1 or more capital letters between the beginning and the end of the line...if there are any lowercase characters in the line it won't match.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With