Question:
Given a string S S.length() <= 5.10^5
and an integer K K <= S.length()
. For each removal, you can:
How can I do exactly K removals such that the final string has minimum lexicographical order?
Example:
S= "abacaaba", K = 2
The final string: "aacaaa" which is the smallest lexicographical possible.
P/S:
I've tried for many days but can't figure out an efficience way to solve this problem. But I think there's something to do with dynamic programming.
Approach: In order to get the lexicographically smallest string, we need to take the minimum character from the first K characters every time we choose a character from str. To do that, we can put the first K characters in a priority_queue (min-heap) and then choose the smallest character and append it to X.
The smallest lexicographical order is an order relation where string s is smaller than t, given the first character of s (s1) is smaller than the first character of t (t1), or in case they are equivalent, the second character, etc.
Create a simple integer array(hash[]) of size 26. Each of its index will denote an alphabet, considering your string is only comprised of small alphabets. Initialize the array to zero. Now traverse the string and for each occurred alphabet in string add its frequency by one in hash[].
All together, these ideas should lead to a linear-time algorithm.
If K ≤ N−4, the final string has at least four characters. Its two-character prefix is the least two-character subsequence of the (K+2)-character prefix of the initial string. Compute this prefix and the possible positions of its second character. This can be accomplished in O(K) time by scanning through first K+2 characters, maintaining the least character so far and the least two-character subsequence so far.
Now that we know the two-character prefix, we just have to determine the best suffix. For a prefix that required J deletions to set up, the final string continues with the next N−4 − K characters that we can't touch, followed by the least two-character subsequence of the (K+2 − J)-character suffix of the initial string. We can compute the least two-character subsequence of each of the relevant suffixes using the scanning algorithm described previously. The one tricky part is comparing the untouchable middles efficiently. This can be accomplished with some difficulty using a suffix array with longest common prefixes.
If K > N−4, just return the least (N−K)-character subsequence.
Interesting task!
All combinations with length M, which consist of 3'th and 4'th remove operations are equal to this class of operations: Zero or more 3 after that zero or more 4, like this regexp: (3)(4) You can prove it:
So you can pick rightmost 3 and with rule 3 make it the only one 3. And with rule 4 make transform all left 4 to 3.
any ->rule3-> 4...434...4 -> rule1-> 3...34...4
It leads to O(K^3) complexity of step 6 brute force.
There are some ideas and solution that works nice in common
In common case this algorithm work's well. But in worst case it's weak. There is counterpoint: Maxlength string with same letter. Then we have K candidates and algorithm complexity will be O(K^2) - it's not good for this task.
For deal with it i think we can choose right candidate at 6'th step of algorithm:
6*. For two candidates - compare their suffix - letters after it. Candidate with smaller letter at same tail position (tail position counts from this candidate head position) is better for our purposes.
7*. Compare two possibilities form 5'th algorithm step and choose minimal.
Problem of this (*) approach - i cannot get a rigid proof that it's better solution. Most hard part, when one candidate is a prefix of another - we compare it letter by letter until smallest doesn't end. for example in string abcabcabc...abc with candidate at first and fourth position.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With