Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently calculating unique permutations in a set

I'm currently calculating the unique permutations of an array of data. While the following code is working, it's not as efficient as I would like. Once I get over 6 or 8 items, it becomes very slow and I start running into memory issues.

Here is the code and an explanation

<?php
function permuteUnique($items, $count = false, $perms = [], &$return = []) {
    if ($count && count($return) == $count) return $return;

    if (empty($items)) {
        $duplicate = false;

        foreach ($return as $a) {
            if ($a === $perms) {
                $duplicate = true;
                break;
            }
        }
        if (!$duplicate) $return[] = $perms;
    } else {
        for ($i = count($items) - 1; $i >= 0; --$i) {
            $newitems = $items;
            $newperms = $perms;
            list($tmp) = array_splice($newitems, $i, 1);
            array_unshift($newperms, $tmp);
            permuteUnique($newitems, $count, $newperms, $return);
        }
        return $return;
    }
}

function factorial($n) {
    $f = 1;
    for ($i = 2; $i <= $n; $i++) $f *= $i;
    return $f;
}

Given the input [1, 1, 2] I receive the following output as expected

array (size=3)
  0 => 
    array (size=3)
      0 => int 1
      1 => int 1
      2 => int 2
  1 => 
    array (size=3)
      0 => int 1
      1 => int 2
      2 => int 1
  2 => 
    array (size=3)
      0 => int 2
      1 => int 1
      2 => int 1

The $count parameter is so I can pass the number of unique permutations I expect to the function and once it has found that many, it can stop calculating and return the data. This is calculated as the factorial of the total number of items divided by the product of the factorial of the count of all duplicates. I'm not sure I said that right so let me show you an example.

Given the set [1, 2, 2, 3, 4, 4, 4, 4] the count of unique permutations is calculated as 8! / (2!4!) = 840 because there are 8 total items, one of them is duplicated twice, and another is duplicated 4 times.

Now if I translate that to php code...

<?php
$set = [1, 2, 2, 3, 4, 4, 4, 4];
$divisor = 1;

foreach (array_count_values($set) as $v) {
    $divisor *= factorial($v);
}

$count = factorial(count($set)) / $divisor;
$permutations = permuteUnique($set, $count);

it's pretty slow. If I throw a counter into the permuteUnique function, it runs over 100k times before it finds the 840 unique permutations.

I would like to find a way to reduce this and find the shortest possible path to the unique permutations. I appreciate any help or advice you can give.

like image 218
Rob Avatar asked Sep 21 '13 18:09

Rob


People also ask

How do you calculate unique permutations?

The number of permutations of n distinct objects, taken r at a time is: Pr = n! / (n - r)! Thus, 210 different 3-digit numbers can be formed from the digits 1, 2, 3, 4, 5, 6, and 7.

How do you remove duplicate permutations?

1. Collect the permutations in a vector instead of printing them; 2. Remove duplicates from the vector; 3. Print the vector elements.

What is distinct permutation?

A permutation of a set of distinct objects is. an arrangement of the objects in a specific order without repetition.


2 Answers

So I spent some more time thinking about this and here is what I came up with.

<?php
function permuteUnique($items, $perms = [], &$return = []) {
    if (empty($items)) {
        $return[] = $perms;
    } else {
        sort($items);
        $prev = false;
        for ($i = count($items) - 1; $i >= 0; --$i) {
            $newitems = $items;
            $tmp = array_splice($newitems, $i, 1)[0];
            if ($tmp != $prev) {
                $prev = $tmp;
                $newperms = $perms;
                array_unshift($newperms, $tmp);
                permuteUnique($newitems, $newperms, $return);
            }
        }
        return $return;
    }
}

$permutations = permuteUnique([1, 2, 2, 3, 4, 4, 4, 4]);

Previous stats

Uniques: 840
Calls to permuteUnique: 107,591
Duplicates found: 38737
Execution time (seconds): 4.898668050766

New stats

Uniques: 840
Calls to permuteUnique: 2647
Duplicates found: 0
Execution time (seconds): 0.0095300674438477

So all I really did was sort the data set, keep track of the previous item, and not calculate permutations if the current item matched the previous. I also no longer have to pre calculate the amount of uniques and iterate through the permutations to check for duplicates. That made a world of difference.

like image 161
Rob Avatar answered Nov 14 '22 21:11

Rob


I just tried the "Generation in lexicographic order" way on the wiki, and it generates the same result for your "1,2,2,3,4,4,4,4" sample, so I guess it is correct. Here is the code:

function &permuteUnique($items) {
    sort($items);
    $size = count($items);
    $return = [];
    while (true) {
        $return[] = $items;
        $invAt = $size - 2;
        for (;;$invAt--) {
            if ($invAt < 0) {
                break 2;
            }
            if ($items[$invAt] < $items[$invAt + 1]) {
                break;
            }
        }
        $swap1Num = $items[$invAt];
        $inv2At = $size - 1;
        while ($swap1Num >= $items[$inv2At]) {
            $inv2At--;
        }
        $items[$invAt] = $items[$inv2At];
        $items[$inv2At] = $swap1Num;
        $reverse1 = $invAt + 1;
        $reverse2 = $size - 1;
        while ($reverse1 < $reverse2) {
            $temp = $items[$reverse1];
            $items[$reverse1] = $items[$reverse2];
            $items[$reverse2] = $temp;
            $reverse1++;
            $reverse2--;
        }
    }
    return $return;
}

Profiling the time for your example input: the above method: 2600,3000,3000,2400,2400,3000; your "Calls to permuteUnique: 2647" method: 453425.6,454425.4,454625.8. In your this example input, it is around 500 times faster :) If you are processing the result one by one (I guess you will), using this non-recursive method, you can process one generated and then generate the next (instead of generating all and store all before processing).

like image 37
daifei4321 Avatar answered Nov 14 '22 22:11

daifei4321