I recetly come across an interview question : To find all the repeating substring in a given string with a minimal size of 2. The algorithm should be efficient one.
Code for above question is given below but it isn't efficient one.
#include <iostream>
#include <algorithm>
#include <iterator>
#include <set>
#include <string>
using namespace std;
int main()
{
typedef string::const_iterator iterator;
string s("ABCFABHYIFAB");
set<string> found;
if (2 < s.size())
for (iterator i = s.begin() + 1, j = s.end(); i != j; ++i)
for (iterator x = s.begin(); x != i; ++x)
{
iterator tmp = mismatch(i, j, x).second;;
if (tmp - x > 1)
found.insert(string(x, tmp));
}
copy(found.begin(), found.end(),ostream_iterator<string>(cout, "\n"));
}
My question is that, is there any data structure which can implement above question in time complexity of O(N)?
If your answer is Suffix tree or Hashing please elaborate it.
Python has a built-in function for counting the repeated substring in a given string called count(). As the name suggests, it counts the occurrence of a substring in a given string.
If you analyze the output for the string "AAAAAAAAAAAAAA"
, then there are O(n²) characters in it, so the algorithm is at least O(n²).
To achieve O(n²), just build the suffix tree for each suffix of s (indices [1..n], [2..n], [3..n], ..., [n..n]). It doesn't matter if one of the strings has no own end node, just count how often each node is used.
At the end, iterate over each node with count>1 and print its path.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With