I need to write a regex, that would identify a word that have a repeating character set at the end. According to the following code fragment, the repeating character set is An
. I need to write a regex so this will be spotted and displayed.
According to the following code, \\w
will match any word character (including digit, letter, or special character). But i only want to identify english characters.
String stringToMatch = "IranAnAn";
Pattern p = Pattern.compile("(\\w)\\1+");
Matcher m = p.matcher(stringToMatch);
if (m.find())
{
System.out.println("Word contains duplicate characters " + m.group(1));
}
UPDATE
Word contains duplicate characters a
Word contains duplicate characters a
Word contains duplicate characters An
An efficient solution is to use Hashing to solve this in O(N) time on average. Create an empty hash. Scan each character of input string and insert values to each keys in the hash. When any character appears more than once, hash key value is increment by 1, and return the character.
A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.
Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).
A char extension method that repeats a character the specified number of times.
You want to catch as many characters in your set as possible, so instead of (\\w)
you should use (\\w+)
and you want the sequence to be at the end, so you need to add $
(and I have removed the +
after \\1
which is not useful to detect repetition: only one repetition is needed):
Pattern p = Pattern.compile("(\\w+)\\1$");
Your program then outputs An
as expected.
Finally, if you only want to capture ascii characters, you can use [a-zA-Z]
instead of \\w
:
Pattern p = Pattern.compile("([a-zA-Z]+)\\1$");
And if you want the character set to be at least 2 characters:
Pattern p = Pattern.compile("([a-zA-Z]{2,})\\1$");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With