Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strategy to modify permutation algorithm to prevent duplicate printouts

I've been reviewing algorithms for practice, and I'm currently looking at a permutation algorithm that I quite like:

void permute(char* set, int begin, int end) {
    int range = end - begin;

    if (range == 1)
        cout << set << endl;
    else {
        for(int i = 0; i < range; ++i) {
            swap(&set[begin], &set[begin+i]);
            permute(set, begin+1, end);
            swap(&set[begin], &set[begin+i]);
        }
    }
}

I actually wanted to apply this to a situation where there will be many repeated characters though, so I need to be able to modify it to prevent the printing of duplicate permutations.

How would I go about detecting that I was generating a duplicate? I know I could store this in a hash or something similar, but that's not an optimal solution - I'd prefer one that didn't require extra storage. Can someone give me a suggestion?

PS: I don't want to use the STL permutation mechanisms, and I don't want a reference to another "unique permutation algorithm" somewhere. I'd like to understand the mechanism used to prevent duplication so I can build it into this in learn, if possible.

like image 569
John Humphreys Avatar asked Feb 29 '12 03:02

John Humphreys


2 Answers

There is no general way to prevent arbitrary functions from generating duplicates. You can always filter out the duplicates, of course, but you don't want that, and for very good reasons. So you need a special way to generate only non-duplicates.

One way would be to generate the permutations in increasing lexicographical order. Then you can just compare if a "new" permatutation is the same as the last one, and then skip it. It gets even better: the algorithm for generating permutations in increasing lexicographical order given at http://en.wikipedia.org/wiki/Permutations#Generation_in_lexicographic_order doesn't even generate the duplicates at all!

However, that is not an answer to your question, as it is a different algorithm (although based on swapping, too).

So, let's look at your algorithm a little closer. One key observation is:

  • Once a character is swapped to position begin, it will stay there for all nested calls of permute.

We'll combine this with the following general observation about permutations:

  • If you permute a string s, but only at positions where there's the same character, s will remain the same. In fact, all duplicate permutations have a different order for the occurences of some character c, where c occurs at the same positions.

OK, so all we have to do is to make sure that the occurences of each character are always in the same order as in the beginning. Code follows, but... I don't really speak C++, so I'll use Python and hope to get away with claiming it's pseudo code.

We start by your original algorithm, rewritten in 'pseudo code':

def permute(s, begin, end):
    if end == begin + 1:
        print(s)
    else:
        for i in range(begin, end):
            s[begin], s[i] = s[i], s[begin]
            permute(s, begin + 1, end)
            s[begin], s[i] = s[i], s[begin]

and a helper function that makes calling it easier:

def permutations_w_duplicates(s):
    permute(list(s), 0, len(s)) # use a list, as in Python strings are not mutable

Now we extend the permute function with some bookkeeping about how many times a certain character has been swapped to the begin position (i.e. has been fixed), and we also remember the original order of the occurences of each character (char_number). Each character that we try to swap to the begin position then has to be the next higher in the original order, i.e. the number of fixes for a character defines which original occurence of this character may be fixed next - I call this next_fixable.

def permute2(s, next_fixable, char_number, begin, end):
    if end == begin + 1:
        print(s)
    else:
        for i in range(begin, end):
            if next_fixable[s[i]] == char_number[i]: 
                next_fixable[s[i]] += 1
                char_number[begin], char_number[i] = char_number[i], char_number[begin]

                s[begin], s[i] = s[i], s[begin]
                permute2(s, next_fixable, char_number, begin + 1, end)
                s[begin], s[i] = s[i], s[begin]

                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                next_fixable[s[i]] -= 1

Again, we use a helper function:

def permutations_wo_duplicates(s):
    alphabet = set(s)
    next_fixable = dict.fromkeys(alphabet, 0)
    count = dict.fromkeys(alphabet, 0)
    char_number = [0] * len(s)
    for i, c in enumerate(s):
        char_number[i] = count[c]
        count[c] += 1

    permute2(list(s), next_fixable, char_number, 0, len(s))

That's it!

Almost. You can stop here and rewrite in C++ if you like, but if you are interested in some test data, read on.

I used a slightly different code for testing, because I didn't want to print all permutations. In Python, you would replace the print with a yield, with turns the function into a generator function, the result of which can be iterated over with a for loop, and the permutations will be computed only when needed. This is the real code and test I used:

def permute2(s, next_fixable, char_number, begin, end):
    if end == begin + 1:
        yield "".join(s) # join the characters to form a string
    else:
        for i in range(begin, end):
            if next_fixable[s[i]] == char_number[i]:
                next_fixable[s[i]] += 1
                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                s[begin], s[i] = s[i], s[begin]
                for p in permute2(s, next_fixable, char_number, begin + 1, end):
                    yield p
                s[begin], s[i] = s[i], s[begin]
                char_number[begin], char_number[i] = char_number[i], char_number[begin]
                next_fixable[s[i]] -= 1

def permutations_wo_duplicates(s):
    alphabet = set(s)
    next_fixable = dict.fromkeys(alphabet, 0)
    count = dict.fromkeys(alphabet, 0)
    char_number = [0] * len(s)
    for i, c in enumerate(s):
        char_number[i] = count[c]
        count[c] += 1

    for p in permute2(list(s), next_fixable, char_number, 0, len(s)):
        yield p


s = "FOOQUUXFOO"
A = list(permutations_w_duplicates(s))
print("%s has %s permutations (counting duplicates)" % (s, len(A)))
print("permutations of these that are unique: %s" % len(set(A)))
B = list(permutations_wo_duplicates(s))
print("%s has %s unique permutations (directly computed)" % (s, len(B)))

print("The first 10 permutations       :", A[:10])
print("The first 10 unique permutations:", B[:10])

And the result:

FOOQUUXFOO has 3628800 permutations (counting duplicates)
permutations of these that are unique: 37800
FOOQUUXFOO has 37800 unique permutations (directly computed)
The first 10 permutations       : ['FOOQUUXFOO', 'FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUXOOF', 'FOOQUUXOFO', 'FOOQUUFXOO', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX']
The first 10 unique permutations: ['FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX', 'FOOQUUOFXO', 'FOOQUUOFOX', 'FOOQUUOXFO', 'FOOQUUOXOF']

Note that the permutations are computed in the same order than the original algorithm, just without the duplicates. Note that 37800 * 2! * 2! * 4! = 3628800, just like you would expect.

like image 94
Reinstate Monica Avatar answered Oct 16 '22 10:10

Reinstate Monica


You could add an if statement to prevent the swap code from executing if it would swap two identical characters. The for loop is then

for(int i = 0; i < range; ++i) {
    if(i==0 || set[begin] != set[begin+i]) {
      swap(&set[begin], &set[begin+i]);
      permute(set, begin+1, end);
      swap(&set[begin], &set[begin+i]);
    }
}

The reason for allowing the case i==0 is make sure the recursive call happens exactly once even if all the characters of the set are the same.

like image 45
user537390 Avatar answered Oct 16 '22 10:10

user537390