Find longest substring s.t the count of each character does not exceed the number of unique characters

Question

Given a string, I want to find the longest substring such that the count of each character does not exceed the number of unique characters in that substring.

For example:

Input: 'aaabb'
Output: 'aabb'

I understand how to do this brute-force: Generate all substrings while tracking the count of each character and the number of distinct characters in each substring. If there's a substring where the condition is violated, we move the start of the window forward. Else, we move the end of the window forward.

The brute-force approach is in O(n^2). Is there a more efficient way to do this?

Kaia · Accepted Answer

Here's an O(n^2) algorithm, I think. (note that I don't think the O(n^2) solution given in the question works)

O(n^2) in the `n >>> # of unique characters` case:

Let S be the input string.

Scan S and produce C the set of distinct characters in S. Let |C| denote the length of C.
For each character c in C, produce a cumulative-occurrences array O_c, where each O_c[i] is the number of c in the substring S[0:i]. This is O(n * |C|).
Given a start and endpoint, we can get the number of characters c in the substring via O_c[end] - O_c[start]. Obtain the maximum occurrences and the number of non-zero occurrences and use these to check validity.
Using 3, check each start and endpoint in O(n^2 * |C|). If we can assume that the number of characters is small compared to n, that's O(n^2).

It seems like if there is an improvement over O(n^2), it'd be some way to avoid checking every start and endpoint in 4.

O(n) in the `n >>> (# of unique characters)^2` case:

If for some substring S[i:j] there are more occurrences of some character c than |C|, then any S[i:k] with k > j is an invalid substring (because all the unique characters in the string cannot make up for the current occurrences of c). Thus, we can immediately increment i once we find this condition.
For substring of length l, at least one character will have at least ceil(l / |C|) occurrences (pigeonhole argument).
So the maximal length of a valid substring is |C|^2.

Find longest substring s.t the count of each character does not exceed the number of unique characters

Tags:

algorithm

Mark Carothers

1 Answers

O(n^2) in the `n >>> # of unique characters` case:

O(n) in the `n >>> (# of unique characters)^2` case:

Kaia

Recent Activity

Donate For Us

Find longest substring s.t the count of each character does not exceed the number of unique characters

Tags:

algorithm

Mark Carothers

1 Answers

O(n^2) in the n >>> # of unique characters case:

O(n) in the n >>> (# of unique characters)^2 case:

Kaia

Related questions

Recent Activity

Donate For Us

O(n^2) in the `n >>> # of unique characters` case:

O(n) in the `n >>> (# of unique characters)^2` case: