Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a regex to detect repeat-characters [duplicate]

Tags:

java

regex

I need to write a regex, that would identify a word that have a repeating character set at the end. According to the following code fragment, the repeating character set is An. I need to write a regex so this will be spotted and displayed.

According to the following code, \\w will match any word character (including digit, letter, or special character). But i only want to identify english characters.

String stringToMatch = "IranAnAn";
Pattern p = Pattern.compile("(\\w)\\1+");
Matcher m = p.matcher(stringToMatch);
if (m.find())
{
    System.out.println("Word contains duplicate characters " + m.group(1));
}

UPDATE

Word contains duplicate characters a
Word contains duplicate characters a
Word contains duplicate characters An
like image 245
Sharon Watinsan Avatar asked Jul 22 '13 17:07

Sharon Watinsan


People also ask

How do you check for repeating characters in a string?

An efficient solution is to use Hashing to solve this in O(N) time on average. Create an empty hash. Scan each character of input string and insert values to each keys in the hash. When any character appears more than once, hash key value is increment by 1, and return the character.

How do you repeat in regex?

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.

How do you denote special characters in regex?

Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).

What is a repeat char?

A char extension method that repeats a character the specified number of times.


1 Answers

You want to catch as many characters in your set as possible, so instead of (\\w) you should use (\\w+) and you want the sequence to be at the end, so you need to add $ (and I have removed the + after \\1 which is not useful to detect repetition: only one repetition is needed):

Pattern p = Pattern.compile("(\\w+)\\1$");

Your program then outputs An as expected.

Finally, if you only want to capture ascii characters, you can use [a-zA-Z] instead of \\w:

Pattern p = Pattern.compile("([a-zA-Z]+)\\1$");

And if you want the character set to be at least 2 characters:

Pattern p = Pattern.compile("([a-zA-Z]{2,})\\1$");
like image 121
assylias Avatar answered Oct 06 '22 01:10

assylias