Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove consecutive duplicates in a string to make the smallest string

Tags:

algorithm

Given a string and the constraint of matching on >= 3 characters, how can you ensure that the result string will be as small as possible?

edit with gassa's explicitness:

E.G. 'AAAABBBAC'

If I remove the B's first, AAAA[BBB]AC -- > AAAAAC, then I can remove all of the A's from the resultant string and be left with:

[AAAAA]C --> C

'C'

If I just remove what is available first (the sequence of A's), I get:

[AAAA]BBBAC -- > [BBB]AC --> AC

'AC'

like image 716
Anisotropic Avatar asked Apr 05 '18 01:04

Anisotropic


2 Answers

A tree would definitely get you the shortest string(s).

The tree solution:

  1. Define a State (node) for each current string Input and all its removable sub-strings' int[] Indexes.
  2. Create the tree: For each int index create another State and add it to the parent state State[] Children.
  3. A State with no possible removable sub-strings has no children Children = null.
  4. Get all Descendants State[] of your root State. Order them by their shortest string Input. And that is/are your answer(s).

Test cases:

string result = FindShortest("AAAABBBAC");      // AC
string result2 = FindShortest("AABBAAAC");      // AABBC
string result3 = FindShortest("BAABCCCBBA");    // B

The Code:

Note: Of-course everyone is welcome to enhance the following code in terms of performance and/or fixing any bug.

class Program
{
    static void Main(string[] args)
    {
        string result = FindShortest("AAAABBBAC");      // AC
        string result2 = FindShortest("AABBAAAC");      // AABBC
        string result3 = FindShortest("BAABCCCBBA");    // B
    }

    // finds the FIRST shortest string for a given input
    private static string FindShortest(string input)
    {
        // all possible removable strings' indexes
        // for this given input
        int[] indexes = RemovableIndexes(input);

        // each input string and its possible removables are a state
        var state = new State { Input = input, Indexes = indexes };

        // create the tree
        GetChildren(state);

        // get the FIRST shortest
        // i.e. there would be more than one answer sometimes
        // this could be easily changed to get all possible results
        var result = 
            Descendants(state)
            .Where(d => d.Children == null || d.Children.Length == 0)
            .OrderBy(d => d.Input.Length)
            .FirstOrDefault().Input;


        return result;
    }

    // simple get all descendants of a node/state in a tree
    private static IEnumerable<State> Descendants(State root)
    {
        var states = new Stack<State>(new[] { root });
        while (states.Any())
        {
            State node = states.Pop();
            yield return node;
            if (node.Children != null)
                foreach (var n in node.Children) states.Push(n);
        }
    }

    // creates the tree
    private static void GetChildren(State state)
    {
        // for each an index there is a child
        state.Children = state.Indexes.Select(
                i =>
                {
                    var input = RemoveAllAt(state.Input, i);
                    return input.Length < state.Input.Length && input.Length > 0
                    ? new State
                    {
                        Input = input,
                        Indexes = RemovableIndexes(input)
                    }
                    : null;
                }).ToArray();

        foreach (var c in state.Children)
            GetChildren(c);
    }

    // find all possible removable strings' indexes
    private static int[] RemovableIndexes(string input)
    {
        var indexes = new List<int>();

        char d = input[0];
        int count = 1;
        for (int i = 1; i < input.Length; i++)
        {
            if (d == input[i])
                count++;
            else
            {
                if (count >= 3)
                    indexes.Add(i - count);

                // reset
                d = input[i];
                count = 1;
            }
        }
        if (count >= 3)
            indexes.Add(input.Length - count);


        return indexes.ToArray();
    }

    // remove all duplicate chars starting from an index
    private static string RemoveAllAt(string input, int startIndex)
    {
        string part1, part2;

        int endIndex = startIndex + 1;
        int i = endIndex;
        for (; i < input.Length; i++)
            if (input[i] != input[startIndex])
            {
                endIndex = i;
                break;
            }

        if (i == input.Length && input[i - 1] == input[startIndex])
            endIndex = input.Length;

        part1 = startIndex > 0 ? input.Substring(0, startIndex) : string.Empty;
        part2 = endIndex <= (input.Length - 1) ? input.Substring(endIndex) : string.Empty;

        return part1 + part2;
    }

    // our node, which is 
    // an input string & 
    // all possible removable strings' indexes
    // & its children
    public class State
    {
        public string Input;
        public int[] Indexes;

        public State[] Children;
    }
}
like image 149
Bassem Akl Avatar answered Sep 28 '22 05:09

Bassem Akl


I propose O(n^2) solution with dynamic programming.

Let's introduce notation. Prefix and suffix of length l of string A denoted by P[l] and S[l]. And we call our procedure Rcd.

  1. Rcd(A) = Rcd(Rcd(P[n-1])+S[1])
  2. Rcd(A) = Rcd(P[1]+Rcd(S[n-1]))

Note that outer Rcd in the RHS is trivial. So, that's our optimal substructure. Based on this i came up with the following implementation:

#include <iostream>
#include <string>
#include <vector>
#include <cassert>
using namespace std;

string remdupright(string s, bool allowEmpty) {
    if (s.size() >= 3) {
        auto pos = s.find_last_not_of(s.back());
        if (pos == string::npos && allowEmpty) s = "";
        else if (pos != string::npos && s.size() - pos > 3) s = s.substr(0, pos + 1);
    }
    return s;
}

string remdupleft(string s, bool allowEmpty) {
    if (s.size() >= 3) {
        auto pos = s.find_first_not_of(s.front());
        if (pos == string::npos && allowEmpty) s = "";
        else if (pos != string::npos && pos >= 3) s = s.substr(pos);
    }
    return s;
}

string remdup(string s, bool allowEmpty) {
    return remdupleft(remdupright(s, allowEmpty), allowEmpty);
}

string run(const string in) {
    vector<vector<string>> table(in.size());
    for (int i = 0; i < (int)table.size(); ++i) {
        table[i].resize(in.size() - i);
    }
    for (int i = 0; i < (int)table[0].size(); ++i) {
        table[0][i] = in.substr(i,1);
    }

    for (int len = 2; len <= (int)table.size(); ++len) {
        for (int pos = 0; pos < (int)in.size() - len + 1; ++pos) {
            string base(table[len - 2][pos]);
            const char suffix = in[pos + len - 1];
            if (base.size() && suffix != base.back()) {
                base = remdupright(base, false);
            }
            const string opt1 = base + suffix;

            base = table[len - 2][pos+1];
            const char prefix = in[pos];
            if (base.size() && prefix != base.front()) {
                base = remdupleft(base, false);
            }
            const string opt2 = prefix + base;

            const string nodupopt1 = remdup(opt1, true);
            const string nodupopt2 = remdup(opt2, true);

            table[len - 1][pos] = nodupopt1.size() > nodupopt2.size() ? opt2 : opt1;
            assert(nodupopt1.size() != nodupopt2.size() || nodupopt1 == nodupopt2);
        }
    }
    string& res = table[in.size() - 1][0];
    return remdup(res, true);
}

void testRcd(string s, string expected) {
    cout << s << " : " << run(s) << ", expected: " << expected << endl;
}

int main()
{
    testRcd("BAABCCCBBA", "B");
    testRcd("AABBAAAC", "AABBC");
    testRcd("AAAA", "");
    testRcd("AAAABBBAC", "C");
}

You can check default and run your tests here.

like image 34
Yola Avatar answered Sep 28 '22 05:09

Yola