Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best algorithm to find a repeating pattern

What are the best algorithms available to find longest repeating patterns of characters in a string using .net?

like image 696
Jayantha Lal Sirisena Avatar asked Mar 15 '11 12:03

Jayantha Lal Sirisena


2 Answers

I guess that you speak about pattern discovery. Take a look at some elementary aproach (source)

private static Dictionary<string, int> FindPatterns(string value) {
  List<string> patternToSearchList = new List<string>();
  for (int i = 0; i < value.Length; i++) {
    for (int j = 2; j <= value.Length / 2; j++) {
      if (i + j <= value.Length) {
        patternToSearchList.Add(value.Substring(i, j));
      }
    }
  }
  // pattern matching
  Dictionary<string, int> results = new Dictionary<string, int>();
  foreach (string pattern in patternToSearchList) {
    int occurence = Regex.Matches(value, pattern, RegexOptions.IgnoreCase).Count;
    if (occurence > 1) {
      results[pattern] = occurence;
    }
  }

  return results;
}

static void Main(string[] args) {
  Dictionary<string, int> result = FindPatterns("asdxgkeopgkajdflkjbpoijadadafhjkafikeoadkjhadfkjhocihakeo");
  foreach (KeyValuePair<string, int> res in result.OrderByDescending(r => r.Value)) {
    Console.WriteLine("Pattern:" + res.Key + " occurence:" + res.Value.ToString());
  }
  Console.Read();
}

The algorithm consist of 2 stages.

  • Choose pattern
  • Find pattern in input string (Algorithm of pattern matching)

It is used Regex for pattern matching. There are other more advanced algorithms. These algorithms are enlisted on address http://www-igm.univ-mlv.fr/~lecroq/string/ However, code samples are written in C. Also you'd take a look on Boyer-Moore algorithm for pattern matching, written in C#

like image 106
Oleg Svechkarenko Avatar answered Sep 28 '22 19:09

Oleg Svechkarenko


Pseudocode:

For N=1 to InputString.Length-1
  rotatedString = RotateStringByN(InputString,N)
  For N=0 to InputString.Length-1
     StringResult[N] = if (rotatedString[N]==InputString[N]) then
                            InputString[N]  
                       else 
                            Convert.ToChar(0x0).ToString()
  RepeatedStrings[] = String.Split(StringResult, Convert.ToChar(0x0).ToString())
  SaveLongestStringFrom(RepeatedStrings)

... Or just look here at SO thread for other solutions.

like image 44
Agnius Vasiliauskas Avatar answered Sep 28 '22 19:09

Agnius Vasiliauskas