Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance tips for finding unique permutation

TLDR: how to find multidimensional array permutation in php and how to optimize for bigger arrays?

This is continuation of this question: how to find multidimensional array permutation in php

we have script for sorting array, idea is to find unique permutation of array, rules to find this permutation are:

  1. Input array contains set of arrays.
  2. Each inner array contains unique elements.
  3. Each inner array may have different length and different values.
  4. Output array must contain exact same values.
  5. Output inner array must have unique values on same key.
  6. If there is no solution, wildcard ie.: null are allowed.
  7. Wildcards can be duplicated on same key.
  8. Solution should have as few wildcards as possible.
  9. Algorithm should be able to handle array up to 30x30 in less than 180 s.

i have this solution so far:

function matrix_is_solved(array $matrix) {
    foreach (array_keys(current($matrix)) as $offset) {
        $column = array_filter($raw = array_column($matrix, $offset));
        if (count($column) != count(array_unique($column))) return false;
    }
    return true;
}

function matrix_generate_vectors(array $matrix) {
    $vectors = [];
    $columns = count(current($matrix));
    $gen = function ($depth=0, $combo='') use (&$gen, &$vectors, $columns) {
        if ($depth < $columns)
             for ($i = 0; $i < $columns; $i++)
                $gen($depth + 1, $i . $combo);
        else
            $vectors[] = array_map('intval', str_split($combo));
    };
    $gen();
    return $vectors;
}

function matrix_rotate(array $matrix, array $vector) {
   foreach ($matrix as $row => &$values) {
       array_rotate($values, $vector[$row]);
   }
   return $matrix;
}

function matrix_brute_solve(array $matrix) {
    matrix_make_square($matrix);
    foreach (matrix_generate_vectors($matrix) as $vector) {
        $attempt = matrix_rotate($matrix, $vector);
        if (matrix_is_solved($attempt))
            return matrix_display($attempt);
    }
    echo 'No solution';
}

function array_rotate(array &$array, $offset) {
    foreach (array_slice($array, 0, $offset) as $key => $val) {
        unset($array[$key]);
        $array[$key] = $val;
    }
    $array = array_values($array);
}

function matrix_display(array $matrix = null) {
    echo "[\n";
    foreach ($matrix as $row => $inner) {
        echo "  $row => ['" . implode("', '", $inner) . "']\n";
    }
    echo "]\n";
}

function matrix_make_square(array &$matrix) {
    $pad = count(array_keys($matrix));
    foreach ($matrix as &$row)
        $row = array_pad($row, $pad, '');
}

$tests = [
[ ['X'], ['X'] ],
[ ['X'], ['X'], ['X'] ],
[ [ 'X', '' ], [ '', 'X' ] ],
[ ['X', 'Y', 'Z'], ['X', 'Y'], ['X']],
[ ['X', 'Y'], ['X', 'Y'], ['X', 'Y'] ]
];
array_map(function ($matrix) {
    matrix_display($matrix);
    echo "solved by:" . PHP_EOL;
    matrix_brute_solve($matrix);
    echo PHP_EOL;
}, $tests);

And this works perfectly on small array, but trouble is this iterating over all possibilities of array movements, and for array like 6x6 this is just too much to compute - O(nn) in both time and space!

like image 622
DorienCragen Avatar asked Nov 13 '17 09:11

DorienCragen


1 Answers

The soluton is quite simple actually. You check the number of unique chars and that's the number of values in the output array. Below code will do what you want almost instantly.

The hard part is removing the wildcards. This is something you can only do with bruteforce if you want 100% certainty. The solution below will try it's best to remove all wildcards by switching positions several times in order.

This is similar to the way google handles the Traveling Salesman Problem in it's OR-tools. You need to find the best mix between accuracy and speed. By setting the loop count higher in the function below, chances of success increase. But it will be slower.

/* HELPERS */

function ShowNice($output) {
  //nice output:
  echo '<pre>';
  foreach($output as $key=>$val) {
    echo '<br />' . str_pad($key,2," ",STR_PAD_LEFT) . ' => [';
    $first = true;
    foreach($val as $char) {
      if (!$first) {
        echo ', ';
      }
      echo "'".$char."'";
      $first = false;
    }
    echo ']';
  }
  echo '</pre>';
}

function TestValid($output, $nullchar) {
  $keys = count($output[0]);
  for ($i=0;$i<$keys;$i++) {
    $found = [];
    foreach($output as $key=>$val) {
      $char = $val[$i];
      if ($char==$nullchar) {
        continue;
      }
      if (array_key_exists($char, $found)) {
        return false; //this char was found before
      }
      $found[$char] = true;
    }
  }

  return true;
}


$input = [
  0 => ['X', 'Y', 'Z', 'I', 'J'],
  1 => ['X', 'Y', 'Z', 'I'],
  2 => ['X', 'Y', 'Z', 'I'],
  3 => ['X', 'Y', 'Z', 'I'],
  4 => ['X', 'Y', 'Z'],
  5 => ['X', 'Y', 'Z']
];

//generate large table
$genLength = 30; //max double alphabet
$innerLength = $genLength;
$input2 = [];
for($i=0;$i<$genLength;$i++) {
  $inner = [];

  if (rand(0, 1)==1) {
    $innerLength--;
  }

  for($c=0;$c<$innerLength;$c++) {
    $ascii = 65 + $c; //upper case
    if ($ascii>90) {
      $ascii += 6; //lower case
    }
    $inner[] = chr($ascii);
  }
  $input2[] = $inner;
}


//generate large table with different keys
$genLength = 10; //max double alphabet
$innerLength = $genLength;
$input3 = [];
for($i=0;$i<$genLength;$i++) {
  $inner = [];

  if (rand(0, 1)==1) {
    //comment o make same length inner arrays, but perhaps more distinct values
    //$innerLength--;
  }

  $nr = 0;
  for($c=0;$c<$innerLength;$c++) {
    $ascii = 65 + $c + $nr; //upper case
    if ($ascii>90) {
      $ascii += 6; //lower case
    }
    //$inner[] = chr($ascii);
    $inner[] = $c+$nr+1;

    //increase nr?
    if (rand(0, 2)==1) {
      $nr++;
    }

  }
  $input3[] = $inner;
}


//generate table with numeric values, to show what happens
$genLength = 10; //max double alphabet
$innerLength = $genLength;
$input4 = [];
for($i=0;$i<$genLength;$i++) {
  $inner = [];

  for($c=0;$c<$innerLength;$c++) {
    $inner[] = $c+1;
  }
  $input4[] = $inner;
}


$input5 = [
  0 => ['X', 'Y'],
  1 => ['X', 'Y'],
  2 => ['X', 'Y'],
];

$input6 = [
  0 => ['X', 'Y', 'Z', 'I', 'J'],
  1 => ['X', 'Y', 'Z', 'I'],
  2 => ['X', 'Y', 'Z', 'I'],
  3 => ['X', 'Y', 'Z', 'I'],
  4 => ['X', 'Y', 'Z']
];

$input7 = [
  ['X', 'Y', 'A', 'B'],
  ['X', 'Y', 'A', 'C']
];

$input8 = [
  ['X', 'Y', 'A'],
  ['X', 'Y', 'B'],
  ['X', 'Y', 'C']
];

$input9 = [
  ['X', 'Z', 'Y', 'A', 'E', 'D'],
  ['X', 'Z', 'Y', 'A', 'B'],
  ['X', 'Z', 'Y', 'A', 'C'],
  ['X', 'Z', 'Y', 'A', 'D'],
  ['X', 'Z', 'Y', 'A', 'D'],
  ['X', 'Z', 'Y', 'A', 'D']
];

/* ACTUAL CODE */

CreateOutput($input, 1);

function CreateOutput($input, $loops=0) {


  echo '<h2>Input</h2>';
  ShowNice($input);


  //find all distinct chars
  //find maxlength for any inner array

  $distinct = [];
  $maxLength = 0;
  $minLength = -1;
  $rowCount = count($input);
  $flipped = [];
  $i = 1;
  foreach($input as $key=>$val) {
    if ($maxLength<count($val)) {
      $maxLength = count($val);
    }
    if ($minLength>count($val) || $minLength==-1) {
      $minLength = count($val);
    }
    foreach($val as $char) {
      if (!array_key_exists($char, $distinct)) {
        $distinct[$char] = $i;
        $i++;
      }
    }

    $flipped[$key] = array_flip($val);
  }

  //keep track of the count for actual chars
  $actualChars = $i-1;
  $nullchar = '_';
  //add null values to distinct
  if ($minLength!=$maxLength && count($distinct)>$maxLength) {
    $char = '#'.$i.'#';
    $distinct[$nullchar] = $i; //now it only gets add when a key is smaller, not if all are the same size
    $i++;
  }

  //if $distinct count is small then rowcount, we need more distinct
  $addForRowcount = (count($distinct)<$rowCount);
  while (count($distinct)<$rowCount) {
    $char = '#'.$i.'#';
    $distinct[$char] = $i;
    $i++;
  }


  //flip the distinct array to make the index the keys
  $distinct = array_flip($distinct);

  $keys = count($distinct);

  //create output
  $output = [];
  $start = 0;
  foreach($input as $key=>$val) {
    $inner = [];
    for ($i=1;$i<=$keys;$i++) {
      $index = $start + $i;
      if ($index>$keys) {
          $index -= $keys;
      }

      if ($index>$actualChars) {
        //just add the null char
        $inner[] = $nullchar;
      } else {
        $char = $distinct[$index];

        //check if the inner contains the char
        if (!array_key_exists($char, $flipped[$key])) {
          $char = $nullchar;
        }

        $inner[] = $char;
      }

    }
    $output[] = $inner;
    $start++;
  }

  echo '<h2>First output, unchecked</h2>';
  ShowNice($output);

  $newOutput = $output;
  for ($x=0;$x<=$loops;$x++) {
    $newOutput = MoveLeft($newOutput, $nullchar);
    $newOutput = MoveLeft($newOutput, $nullchar, true);
    $newOutput = SwitchChar($newOutput, $nullchar);
  }

  echo '<h2>New output</h2>';
  ShowNice($newOutput);
  //in $newoutput we moved all the invalid wildcards to the end
  //now we need to test if the last row has wildcards

  if (count($newOutput[0])<count($output[0])) {
    $output = $newOutput;
  }


  echo '<h2>Best result ('.(TestValid($output, $nullchar)?'VALID':'INVALID').')</h2>';
  ShowNice($output);

  return $output;
}

function MoveLeft($newOutput, $nullchar, $reverse=false) {
  //see if we can remove more wildcards
  $lastkey = count($newOutput[0])-1;
  $testing = true;
  while ($testing) {
    $testing = false; //we decide if we go another round ob_deflatehandler
    $test = $newOutput;

    $lastkey = count($newOutput[0])-1;

    $start = 0;
    $end = count($test);
    if ($reverse) {
      $start = count($test)-1;
      $end = -1;
    }

    for($key = $start;$key!=$end;$key += ($reverse?-1:1) ) {
      $val = $test[$key];
      $org = array_values($val);
      foreach($val as $i=>$char) {
        if ($char!=$nullchar) {
          continue; //we only test wildcards
        }


        $wildcardAtEnd=true;
        for($x=$i+1;$x<=$lastkey;$x++) {
          $nextChar = $val[$x];
          if ($nextChar!=$nullchar) {
            $wildcardAtEnd = false;
            break;
          }
        }


        if ($wildcardAtEnd===true) {
          continue; //the char next to it must not be wildcard
        }

        //remove the wildcard and add it to the base64_encode
        unset($val[$i]);
        $val[] = $nullchar;
        $test[$key] = array_values($val); //correct order

        if (TestValid($test, $nullchar)) {
          //we can keep the new one
          $newOutput = $test;
          $testing = true; //keep testing, but start over to make sure we dont miss anything
          break 2; //break both foreach, not while
        }

        $test[$key] = $org; //reset original values before remove for next test
      }
    }
  }

  $allWildCards = true;
  while ($allWildCards) {
    $lastkey = count($newOutput[0])-1;
    foreach($newOutput as $key=>$val) {
      if ($val[$lastkey]!=$nullchar)  {
        $allWildCards = false;
        break;
      }
    }
    if ($allWildCards) {
      foreach($newOutput as $key=>$val) {
        unset($val[$lastkey]);
        $newOutput[$key] = array_values($val);
      }
      $output = $newOutput;
    }
  }

  return $newOutput;
}

function SwitchChar($newOutput, $nullchar) {
  $switching = true;
  $switched = [];
  while($switching) {
    $switching = false;

    $test = $newOutput;
    $lastkey = count($newOutput[0])-1;

    foreach($test as $key=> $val) {
      foreach($val as $index=>$char) {
        $switched[$key][$index][$char] = true;//store the switches we make

        //see if can move the char somewhere else
        for($i=0;$i<=$lastkey;$i++)
        {
          if ($i==$index) {
            continue;//current pos
          }
          if (isset($switched[$key][$i][$char])) {
            continue; //been here before
          }

          $org = array_values($val);
          $switched[$key][$i][$char] = true;
          $t = $val[$i];
          $val[$index] = $t;
          $val[$i] = $char;
          $test[$key] = array_values($val);

          if (TestValid($test, $nullchar)) {
            //echo '<br />VALID: ' . $key . ' - ' . $index . ' - ' . $i . ' - ' . $t . ' - ' . $char;
            $newOutput = MoveLeft($test, $nullchar);
            $switching = true;
            break 3;//for and two foreach
          }

          //echo '<br />INVALID: ' . $key . ' - ' . $index . ' - ' . $i . ' - ' . $t . ' - ' . $char;
          $val = $org;
          $test[$key] = $org;
        }
      }
    }
  }

  return $newOutput;
}

Result:

Input

   0 => ['X', 'Y', 'A']
   1 => ['X', 'Y', 'B']
   2 => ['X', 'Y', 'C']

   First output, unchecked

   0 => ['X', 'Y', 'A', '_', '_']
   1 => ['Y', '_', 'B', '_', 'X']
   2 => ['_', '_', 'C', 'X', 'Y']

   New output

   0 => ['X', 'Y', 'A', '_', '_']
   1 => ['Y', 'B', 'X', '_', '_']
   2 => ['C', 'X', 'Y', '_', '_']

   Best result (VALID)

   0 => ['X', 'Y', 'A']
   1 => ['Y', 'B', 'X']
   2 => ['C', 'X', 'Y']
like image 65
Hugo Delsing Avatar answered Nov 15 '22 12:11

Hugo Delsing