I am looking for an algorithm that will find the number of repeating substrings in a single string.
For this, I was looking for some dynamic programming algorithms but didn't find any that would help me. I just want some tutorial on how to do this.
Let's say I have a string ABCDABCDABCD
. The expected output for this would be 3
, because there is ABCD
3 times.
For input AAAA
, output would be 4
, since A
is repeated 4 times.
For input ASDF
, output would be 1
, since every individual character is repeated 1 time only.
I hope that someone can point me in the right direction. Thank you.
Python has a built-in function for counting the repeated substring in a given string called count(). As the name suggests, it counts the occurrence of a substring in a given string.
If the length of string is N, then there will be N*(N+1)/2 possible substrings.
Python string count() is an inbuilt function that returns the number of occurrences of the substring in the given string. The count() method searches the substring in the given string and returns how many times the substring is present in it.
I am taking the following assumptions:
ABCDABC
, ABC
would not count as a repeating substring, but it would in case of ABCABC
.ABCABC
, ABC
would not count as a repeating substring.AAAA
, the answer should be 4
(a
is the substring) rather than 2
(aa
is the substring).Under these assumptions, the algorithm is as follows:
inputString
.failure[]
. This operation if of linear time complexity with respect to the length of the string. So, by definition, failure[i]
denotes the length of the longest proper-prefix of the substring inputString[0....i]
that is also a proper-suffix of the same substring.len = inputString.length - failure.lastIndexValue
. At this point, we know that if there is any repeating string at all, then it has to be of this length len
. But we'll need to check for that; First, just check if len
perfectly divides inputString.length
(that is, inputString.length % len == 0
). If yes, then check if every consecutive (non-overlapping) substring of len
characters is the same or not; this operation is again of linear time complexity with respect to the length of the input string.inputString.length
/ len
. Otherwise, the answer is simply inputString.length
, as there is no such repeating substring present.The overall time complexity would be O(n)
, where n
is the number of characters in the input string.
A sample code for calculating the KMP failure array is given here.
For example,
Let the input string be abcaabcaabca
.
Its KMP failure array would be - [0, 0, 0, 1, 1, 2, 3, 4, 5, 6, 7, 8]
.
So, our len
= (12 - 8) = 4.
And every consecutive non-overlapping substring of length 4
is the same (abca
).
Therefore the answer is 12/4
= 3
. That is, abca
is repeated 3 times repeatedly.
The solution for this with C# is:
class Program
{
public static string CountOfRepeatedSubstring(string str)
{
if (str.Length < 2)
{
return "-1";
}
StringBuilder substr = new StringBuilder();
// Length of the substring cannot be greater than half of the actual string
for (int i = 0; i < str.Length / 2; i++)
{
// We will iterate through half of the actual string and
// create a new string by appending the current character to the previous character
substr.Append(str[i]);
String clearedOfNewSubstrings = str.Replace(substr.ToString(), "");
// We will remove the newly created substring from the actual string and
// check if the length of the actual string, cleared of the newly created substring, is 0.
// If 0 it tells us that it is only made of its substring
if (clearedOfNewSubstrings.Length == 0)
{
// Next we will return the count of the newly created substring in the actual string.
var countOccurences = Regex.Matches(str, substr.ToString()).Count;
return countOccurences.ToString();
}
}
return "-1";
}
static void Main(string[] args)
{
// Input: {"abcdaabcdaabcda"}
// Output: 3
// Input: { "abcdaabcdaabcda" }
// Output: -1
// Input: {"barrybarrybarry"}
// Output: 3
var s = "asdf"; // Output will be -1
Console.WriteLine(CountOfRepeatedSubstring(s));
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With