Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use Java Regex to find all repeating character sequences in a string?

Parsing a random string looking for repeating sequences using Java and Regex.

Consider strings:

aaabbaaacccbb

I'd like to find a regular expression that will find all the matches in the above string:

aaabbaaacccbb
^^^  ^^^

aaabbaaacccbb
   ^^      ^^

What is the regex expression that will check a string for any repeating sequences of characters and return the groups of those repeating characters such that group 1 = aaa and group 2 = bb. Also note that I've used an example string but any repeating characters are valid: RonRonJoeJoe ... ... ,, ,,...,,

like image 913
David Urry Avatar asked Apr 23 '12 20:04

David Urry


People also ask

Does * match everything in regex?

Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.

What is difference between matches () and find () in Java regex?

Difference between matches() and find() in Java RegexThe matches() method returns true If the regular expression matches the whole text. If not, the matches() method returns false. Whereas find() search for the occurrence of the regular expression passes to Pattern.


2 Answers

This does it:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        String s = "aaabbaaacccbb";
        find(s);
        String s1 = "RonRonRonJoeJoe .... ,,,,";
        find(s1);
        System.err.println("---");
        String s2 = "RonBobRonJoe";
        find(s2);
    }

    private static void find(String s) {
        Matcher m = Pattern.compile("(.+)\\1+").matcher(s);
        while (m.find()) {
            System.err.println(m.group());
        }
    }
}

OUTPUT:

aaa
bb
aaa
ccc
bb
RonRonRon
JoeJoe
....
,,,,
---
like image 140
Guillaume Polet Avatar answered Nov 05 '22 00:11

Guillaume Polet


The below should work for all requirements. It is actually a combination of a couple of the answers here, and it will print out all of the substrings that are repeated anywhere else in the string.

I set it to only return substrings of at least 2 characters, but it can be easily changed to single characters by changing "{2,}" in the regex to "+".

public static void main(String[] args)
{
  String s = "RonSamJoeJoeSamRon";
  Matcher m = Pattern.compile("(\\S{2,})(?=.*?\\1)").matcher(s);
  while (m.find())
  {
    for (int i = 1; i <= m.groupCount(); i++)
    {
      System.out.println(m.group(i));
    }
  }
}

Output:
Ron
Sam
Joe

like image 37
Trevor Freeman Avatar answered Nov 05 '22 00:11

Trevor Freeman