Question: Given a string S <code>S.length() <= 5.10^5</code> and an integer K <code>K <= S.length()</code>. For each removal, you can: <ul> <li>Remove the first character of the string</li> <li>Remove the second character of the string</li> <li>Remove the last character of the string</li> <li>Remove the second last character of the string</li> </ul> How can I do exactly K removals such that the final string has minimum lexicographical order? Example: S= "abacaaba", K = 2 <ul> <li>Remove the second character of the string</li> <li>Remove the second last character of the string</li> </ul> The final string: "aacaaa" which is the smallest lexicographical possible. P/S: I've tried for many days but can't figure out an efficience way to solve this problem. But I think there's something to do with dynamic programming.

Interesting task! <h3>Update: step 5 incorrect. Here is correct one:</h3> All combinations with length M, which consist of 3'th and 4'th remove operations are equal to this class of operations: Zero or more 3 after that zero or more 4, like this regexp: (3)(4) You can prove it: <ol> <li>43 pair is equal to 33</li> <li>343 pair equal to 443.</li> <li>Moreover 34...43 is equal to 44...43.</li> </ol> So you can pick rightmost 3 and with rule 3 make it the only one 3. And with rule 4 make transform all left 4 to 3. any ->rule3-> 4...434...4 -> rule1-> 3...34...4 It leads to O(K^3) complexity of step 6 brute force. <hr> <h3>Original answer</h3> There are some ideas and solution that works nice in common <ol> <li>[More short word is smaller in lexicographical order] Wrong, as @n. 1.8e9-where's-my-share m. mentinoed. All possible results will be equal length (Length-K), because we should use all of them.</li> <li>Lexicographical order means: for semi-length words we match symbols from left to right until it equal. Result of word comparison is result of first different char comparison result. So minimization of i'th symbol is more important than minimization (i+j)'th symbol for all positive j.</li> <li>So most important is first symbol minimization. Only first removal operation can influence on it. By first removal operation we try to place at first place minimal possible value (It will be minimal value from first K symbols). If there is some positions with minimal letter - we will pick leftmost one (we don't want to delete extra symbols and lost correct answer).</li> <li>Now most important is second letter. So we want to minimize it too. We will make it like in 3'th step of algorithm. But, we use 2'nd remove operation and if we had some variants as minimal - we save all of them as candidates.</li> <li>All combinations with length M, which consist of 3'th and 4'th remove operations are equal to only 2 combinations:</li> </ol> <ul> <li>all operations are 4'th: 44...44</li> <li>all operations are 4'th but the last one is 3: 44...43. So for every candidate we can check only two possibilities.</li> </ul> <ol start="6"> <li>Brute force all candidates with both possibilities. Find minimal.</li> </ol> In common case this algorithm work's well. But in worst case it's weak. There is counterpoint: Maxlength string with same letter. Then we have K candidates and algorithm complexity will be O(K^2) - it's not good for this task. For deal with it i think we can choose right candidate at 6'th step of algorithm: 6*. For two candidates - compare their suffix - letters after it. Candidate with smaller letter at same tail position (tail position counts from this candidate head position) is better for our purposes. 7*. Compare two possibilities form 5'th algorithm step and choose minimal. Problem of this (*) approach - i cannot get a rigid proof that it's better solution. Most hard part, when one candidate is a prefix of another - we compare it letter by letter until smallest doesn't end. for example in string abcabcabc...abc with candidate at first and fourth position.

Minimum Lexicographical String after K removals of first, second, last or penultimate characters

2 Answers

All together, these ideas should lead to a linear-time algorithm.

If K ≤ N−4, the final string has at least four characters. Its two-character prefix is the least two-character subsequence of the (K+2)-character prefix of the initial string. Compute this prefix and the possible positions of its second character. This can be accomplished in O(K) time by scanning through first K+2 characters, maintaining the least character so far and the least two-character subsequence so far.

Now that we know the two-character prefix, we just have to determine the best suffix. For a prefix that required J deletions to set up, the final string continues with the next N−4 − K characters that we can't touch, followed by the least two-character subsequence of the (K+2 − J)-character suffix of the initial string. We can compute the least two-character subsequence of each of the relevant suffixes using the scanning algorithm described previously. The one tricky part is comparing the untouchable middles efficiently. This can be accomplished with some difficulty using a suffix array with longest common prefixes.

If K > N−4, just return the least (N−K)-character subsequence.

answered Oct 21 '22 15:10

David Eisenstat

Interesting task!

Update: step 5 incorrect. Here is correct one:

All combinations with length M, which consist of 3'th and 4'th remove operations are equal to this class of operations: Zero or more 3 after that zero or more 4, like this regexp: (3)(4) You can prove it:

43 pair is equal to 33
343 pair equal to 443.
Moreover 34...43 is equal to 44...43.

So you can pick rightmost 3 and with rule 3 make it the only one 3. And with rule 4 make transform all left 4 to 3.

any ->rule3-> 4...434...4 -> rule1-> 3...34...4

It leads to O(K^3) complexity of step 6 brute force.

Original answer

There are some ideas and solution that works nice in common

[More short word is smaller in lexicographical order] Wrong, as @n. 1.8e9-where's-my-share m. mentinoed. All possible results will be equal length (Length-K), because we should use all of them.
Lexicographical order means: for semi-length words we match symbols from left to right until it equal. Result of word comparison is result of first different char comparison result. So minimization of i'th symbol is more important than minimization (i+j)'th symbol for all positive j.
So most important is first symbol minimization. Only first removal operation can influence on it. By first removal operation we try to place at first place minimal possible value (It will be minimal value from first K symbols). If there is some positions with minimal letter - we will pick leftmost one (we don't want to delete extra symbols and lost correct answer).
Now most important is second letter. So we want to minimize it too. We will make it like in 3'th step of algorithm. But, we use 2'nd remove operation and if we had some variants as minimal - we save all of them as candidates.
All combinations with length M, which consist of 3'th and 4'th remove operations are equal to only 2 combinations:

all operations are 4'th: 44...44
all operations are 4'th but the last one is 3: 44...43. So for every candidate we can check only two possibilities.

Brute force all candidates with both possibilities. Find minimal.

In common case this algorithm work's well. But in worst case it's weak. There is counterpoint: Maxlength string with same letter. Then we have K candidates and algorithm complexity will be O(K^2) - it's not good for this task.

For deal with it i think we can choose right candidate at 6'th step of algorithm:

6*. For two candidates - compare their suffix - letters after it. Candidate with smaller letter at same tail position (tail position counts from this candidate head position) is better for our purposes.

7*. Compare two possibilities form 5'th algorithm step and choose minimal.

Problem of this (*) approach - i cannot get a rigid proof that it's better solution. Most hard part, when one candidate is a prefix of another - we compare it letter by letter until smallest doesn't end. for example in string abcabcabc...abc with candidate at first and fourth position.

answered Oct 21 '22 13:10

Nikxp

Related questions
                            
                                C++ Unexpected behavior with remove_if
                            
                                How to store object of different class types into one container in modern c++?
                            
                                Can floating point equality and inequality tests be assumed to be consistent and repeatable?
                            
                                Character converting funtion std::isupper() & std::islower() C++17
                            
                                How to solve this in less than O(N)?
                            
                                Why does std::reduce need commutativity?
                            
                                Get the absolute path from std::filesystem::path c++
                            
                                What C++20 change to reverse_iterator is breaking this code?
                            
                                How to print generic std::list iterator?
                            
                                Why does Rccp return a list-like output when I was expecting a dataframe output in R?
                            
                                Can class template constructors have a redundant template parameter list in c++20
                            
                                macOS build universal binary 2 with CMake
                            
                                Why is implicit conversion not applied to templated function parameter?
                            
                                No Instance of Constructor "Class:Class" matches argument list
                            
                                Why are there C++ algorithms defined specially for uninitialized memory?
                            
                                Does using epsilon in comparison of floating-point break strict-weak-ordering?
                            
                                Why does -O3 downcast/change the value of a const reference to a variable?
                            
                                Overload of variadic template function with another template
                            
                                (C++) Automatically generate switch statement cases at compile time
                            
                                Inform c or c++ compiler loop length is mutliple of 8

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Minimum Lexicographical String after K removals of first, second, last or penultimate characters

Tags:

c++

algorithm

dynamic-programming

unglinh279

People also ask