Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

To find all the repeating substring in a given string

I recetly come across an interview question : To find all the repeating substring in a given string with a minimal size of 2. The algorithm should be efficient one.

Code for above question is given below but it isn't efficient one.

#include <iostream>
#include <algorithm>
#include <iterator>
#include <set>
#include <string>

using namespace std;

int main()
{
    typedef string::const_iterator iterator;
    string s("ABCFABHYIFAB");
    set<string> found;

    if (2 < s.size())
        for (iterator i = s.begin() + 1, j = s.end(); i != j; ++i)
            for (iterator x = s.begin(); x != i; ++x)
            {
                iterator tmp = mismatch(i, j, x).second;;
                if (tmp - x > 1)
                    found.insert(string(x, tmp));
            }

            copy(found.begin(), found.end(),ostream_iterator<string>(cout, "\n"));
}

My question is that, is there any data structure which can implement above question in time complexity of O(N)?

If your answer is Suffix tree or Hashing please elaborate it.

like image 977
IndieProgrammer Avatar asked Apr 07 '12 13:04

IndieProgrammer


People also ask

How do you find a repeated substring in a string python?

Python has a built-in function for counting the repeated substring in a given string called count(). As the name suggests, it counts the occurrence of a substring in a given string.


1 Answers

If you analyze the output for the string "AAAAAAAAAAAAAA", then there are O(n²) characters in it, so the algorithm is at least O(n²).

To achieve O(n²), just build the suffix tree for each suffix of s (indices [1..n], [2..n], [3..n], ..., [n..n]). It doesn't matter if one of the strings has no own end node, just count how often each node is used.

At the end, iterate over each node with count>1 and print its path.

like image 123
ipc Avatar answered Nov 15 '22 16:11

ipc