Given a string, I want to find the longest substring such that the count of each character does not exceed the number of unique characters in that substring.
For example:
Input: 'aaabb'
Output: 'aabb'
I understand how to do this brute-force: Generate all substrings while tracking the count of each character and the number of distinct characters in each substring. If there's a substring where the condition is violated, we move the start of the window forward. Else, we move the end of the window forward.
The brute-force approach is in O(n^2)
. Is there a more efficient way to do this?
Here's an O(n^2) algorithm, I think. (note that I don't think the O(n^2) solution given in the question works)
n >>> # of unique characters
case:Let S be the input string.
S
and produce C
the set of distinct characters in S
. Let |C|
denote the length of C
.c
in C, produce a cumulative-occurrences array O_c
, where each O_c[i]
is the number of c
in the substring S[0:i]
. This is O(n * |C|).c
in the substring via O_c[end] - O_c[start]
. Obtain the maximum occurrences and the number of non-zero occurrences and use these to check validity.n
, that's O(n^2).It seems like if there is an improvement over O(n^2), it'd be some way to avoid checking every start and endpoint in 4.
n >>> (# of unique characters)^2
case:S[i:j]
there are more occurrences of some character c
than |C|
, then any S[i:k]
with k > j
is an invalid substring (because all the unique characters in the string cannot make up for the current occurrences of c
). Thus, we can immediately increment i
once we find this condition.l
, at least one character will have at least ceil(l / |C|)
occurrences (pigeonhole argument).|C|^2
.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With