Finding substrings of string such that product of the length of the substring with its number of occurrences is maximized

Question

I was thinking of the following problem: Given a string S, let the length of the i^th substring be l_i and number of occurrences of the i^th substring be o_i. Print the substring such that l_i*o_i is maximized.

I have O(n³) solution (brute force) for this problem where I am generating all the substrings and finding the substring with maximum value. My code for the same is as follows:

public static void solve(String S) {
    long max = Integer.MIN_VALUE;
    String res = "";
    for (int i = 0; i < S.length(); i++) {
        for (int j = 1; j <= S.length() - i; j++) {
            String s = S.substring(i, i + j);
            int o = countOccurrences(S, s);
            long p = (long) o * (long) s.length();
            if (max < p) {
                max = p;
                res = s;
            }
        }
    }
    System.out.println(res);
}

where countOccurrences() method takes O(n) time. I was wondering if there was a more efficient way to achieve this.

j_random_hacker · Accepted Answer

Here's a linear-time algorithm:

Build a suffix tree on the input string. This takes O(n) time and space.
Traverse the suffix tree in postorder DFS, calculating the number of descendants for each node by summing the values of its children. As soon as this quantity is known for a node, multiply it with its string length (which is the sum of the length of all edges from the root) and update the best-so-far total if necessary. This also takes O(n) time.

The key points are that

A suffix tree contains only a linear number of internal nodes, and
Any substring that does not correspond to an internal node cannot produce a maximal score. This is because as you trace it from the suffix tree root it must reach "partway down" some edge -- but you can always extend it further without reducing the number of occurrences (which is the number of descendants), and thus increase the score, by continuing on down to the next node.

It might also be possible to do this using suffix arrays instead of suffix trees, in which case it's likely to require a constant factor less memory, but add a logarithmic factor to the running time.

Finding substrings of string such that product of the length of the substring with its number of occurrences is maximized

Tags:

string

algorithm

data-structures

n00bc0d3r

1 Answers

j_random_hacker

Recent Activity

Donate For Us

Finding substrings of string such that product of the length of the substring with its number of occurrences is maximized

Tags:

string

algorithm

data-structures

n00bc0d3r

1 Answers

j_random_hacker

Related questions

Recent Activity

Donate For Us