Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Longest substring that appears at least twice in O(n.logn)

Problem:

Given a String S of N characters (N <= 200 000), find the length of the longest substring that appears at least twice (the substrings can overlap).

My solution:

Here's what i've tried:

int main()
{
    std::string s;
    std::cin >> s;
    int max = 0;
    typedef std::string::const_iterator sit;
    sit end = s.end();
    for(sit it1 = s.begin(); it1 != end; ++it1)
        for(sit it2 = it1 + 1; it2 != end; ++it2)
            max = std::max(max, std::mismatch(it1, it1 + (end - it2), it2).first - it1);
    std::cout << max;
}

Question:

However, the above solution will get TLE as it runs in O(n^3). Is there any ways to improve it so it can run in O(n.logn)?

like image 862
unglinh279 Avatar asked Aug 28 '21 05:08

unglinh279


People also ask

How do you find the longest repeating substring?

Longest Repeating Substring in C++ Suppose we have a string S, we have to find the length of the longest repeating substring(s). We will return 0 if no repeating substring is present. So if the string is like “abbaba”, then the output will be 2. As the longest repeating substring is “ab” or “ba”.

How do you find the longest substring in a string without repeating characters?

Sliding WIndow Approach To check if a substring is present in another string can be done in O(N^2). Algorithm: Run a loop from i = 0 till N – 1 and consider a visited array. Run a nested loop from j = i + 1 to N – 1 and check whether the current character S[j] has already been visited.

What is the longest common substring in a string?

The longest common substring is “abcdez” and is of length 6. Let m and n be the lengths of first and second strings respectively. A simple solution is to one by one consider all substrings of first string and for every substring check if it is a substring in second string.

How to find the longest common substring in O (m*n) time?

Dynamic Programming can be used to find the longest common substring in O (m*n) time. The idea is to find length of the longest common suffix for all substrings of both strings and store these lengths in a table.

What is the length of the longest non-repeating character substring?

The input string is geeksforgeeks The length of the longest non-repeating character substring is 7 Time Complexity: O (n + d) where n is length of the input string and d is number of characters in input string alphabet. For example, if string consists of lowercase English characters then value of d is 26. Auxiliary Space: O (d)

How to find the longest common suffix of a string?

Dynamic Programming can be used to find the longest common substring in O(m*n) time. The idea is to find length of the longest common suffix for all substrings of both strings and store these lengths in a table. The longest common suffix has following optimal substructure property. If last characters match, then we reduce both lengths by 1


2 Answers

find the length of the longest substring that appears at least twice (the substrings can overlap)

This problem is also commonly known as Longest repeated substring problem. It can be solved in linear time with a suffix tree.

To solve this problem:

  1. Add a special character '$' to the given string S),
  2. build a suffix tree from S ;
  3. the longest repeated substring of S is indicated by the deepest internal node in the suffix tree, where depth is measured by the number of characters traversed from the root.

Time complexity:

  • Suffix tree takes O(nlog(k))) time, where k is the size of the alphabet (if k is considered to be a constant, the asymptotic behaviour is linear)
  • tree traversal(for finding longest repeated substring) can be done in O(n) time
like image 91
prakash sellathurai Avatar answered Nov 02 '22 10:11

prakash sellathurai


Suffix tree is an overkill for this problem. In fact, binary search suffices and the implementation is much easier.

Idea

The idea is simple: If there exists a duplicated substring of length N (N > 1), there must also exists one of length N - 1. Therefore, if we let f(x) to denote a function that returns true if there exists a duplicated substring of length x, f(x) will be a monotonic function, which allows a binary search solution.

Implementation

Binary search on the length of the longest duplicated substring and apply sliding windows to check if a given length is possible. Use string hashing to check for duplicate. Complexity: N log N

like image 42
Learning Mathematics Avatar answered Nov 02 '22 09:11

Learning Mathematics