Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Marking multiplicity in a bag of strings

Tags:

algorithm

I have a bag of strings that I want to map to another bag, such that duplicates are post-pended with the multiplicity of that member when discovered, while preserving ordering. For example, given:

["a", "b", "a", "c", "b", "a"]

I want:

["a", "b", "a #1", "c", "b #1", "a #2"]

(As this is a partially-ordered bag, ["a", "a #1", "a #2", "b", "b #1", "c"] is not a valid result.)

An obvious solution builds the multiplicity set (for the example above, {a:3, b:2, c:1}) and is O(n) in time and O(n) in space:

function mark(names) {
    var seen = {};
    for (var i = 0; i < names.length; i++) {
        var name = names[i];
        if (name in seen) {
            names[i] = name + ' #' + seen[name];
            seen[name]++;
        } else {
            seen[name] = 1;
        }
    }

    return names;
};

My question: is there a non-obvious solution that has better overall complexity? Or said differently, what other ways are there to implement this algorithm that better handle the worst case when the bag is actually a set (of very large size)?

Are there other approaches if the poset requirement is removed?

like image 299
bishop Avatar asked Feb 02 '26 20:02

bishop


1 Answers

Which complexity are you trying to lower? There isn't going to be a way to make time less than O(n) because you're at minimum going to spit out a list that's n long which means you're doing at least n operations. You also can't get lower than O(n) total space either, because your output is going to be n long.

The working space for the "seen" would be O(m) where m is the number of unique entries into the array if you used a hashmap or something simliar. As m<=n necessarily you still can't get lower than O(n).

If you are looking for space saving and the input comes in sorted you could do it with O(1) working space just counting up until you see a new character and resetting your counter. Again it doesn't get you lower than O(n) including your output list, but that's impossible.

like image 146
Eric Barr Avatar answered Feb 05 '26 12:02

Eric Barr