Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted random pick

Tags:

php

I have a set of items. I need to randomly pick one. The problem is that they each have a weight of 1-10. A weight of 2 means that the item is twice as likely to be picked than a weight of 1. A weight of 3 is three times as likely.

I currently fill an array with each item. If the weight is 3, I put three copies of the item in the array. Then, I pick a random item.

My method is fast, but uses a lot of memory. I am trying to think of a faster method, but nothing comes to mind. Anyone have a trick for this problem?

EDIT: My Code...

Apparently, I wasn't clear. I do not want to use (or improve) my code. This is what I did.

//Given an array $a where $a[0] is an item name and $a[1] is the weight from 1 to 100.
$b = array();
foreach($a as $t)
    $b = array_merge($b, array_fill(0,$t[1],$t));
$item = $b[array_rand($b)];

This required me to check every item in $a and uses max_weight/2*size of $a memory for the array. I wanted a COMPLETELY DIFFERENT algorithm.

Further, I asked this question in the middle of the night using a phone. Typing code on a phone is nearly impossible because those silly virtual keyboards simply suck. It auto-corrects everything, ruining any code I type.

An yet further, I woke up this morning with an entirely new algorithm that uses virtual no extra memory at all and does not require checking every item in the array. I posted it as an answer below.

like image 494
kainaw Avatar asked Feb 11 '23 08:02

kainaw


2 Answers

This ones your huckleberry.

  $arr = array(
    array("val" => "one", "weight" => 1),
    array("val" => "two", "weight" => 2),
    array("val" => "three", "weight" => 3),
    array("val" => "four", "weight" => 4)
  );

  $weight_sum = 0;
  foreach($arr as $val)
  {
    $weight_sum += $val['weight'];
  }

  $r = rand(1, $weight_sum);
  print "random value is $r\n";

  for($i = 0; $i < count($arr); $i++)
  {
    if($r <= $arr[$i]['weight'])
    {
      print "$r <= {$arr[$i]['weight']}, this is our match\n";
      print $arr[$i]['val'] . "\n";
      break;
    }
    else
    {
      print "$r > {$arr[$i]['weight']}, subtracting weight\n";
      $r -= $arr[$i]['weight'];
      print "new \$r is $r\n";
    }
  }

No need to generate arrays containing an item for every weight, no need to fill an array with n elements for a weight of n. Just generate a random number between 1 and total weight, then loop through the array until you find a weight less than your random number. If it isn't less than the number, subtract that weight from the random and continue.

Sample output:

# php wr.php
random value is 8
8 > 1, subtracting weight
new $r is 7
7 > 2, subtracting weight
new $r is 5
5 > 3, subtracting weight
new $r is 2
2 <= 4, this is our match
four

This should also support fractional weights.

modified version to use array keyed by weight, rather than by item

  $arr2 = array(
  );

  for($i = 0; $i <= 500000; $i++)
  {
    $weight = rand(1, 10);
    $num = rand(1, 1000);
    $arr2[$weight][] = $num;
  }

  $start = microtime(true);

  $weight_sum = 0;
  foreach($arr2 as $weight => $vals) {
    $weight_sum += $weight * count($vals);
  }

  print "weighted sum is $weight_sum\n";

  $r = rand(1, $weight_sum);
  print "random value is $r\n";
  $found = false;
  $elem = null;

  foreach($arr2 as $weight => $vals)
  {
    if($found) break;
    for($j = 0; $j < count($vals); $j ++)
    {
      if($r < $weight)
      {
        $elem = $vals[$j];
        $found = true;
        break;
      }
      else
      {
        $r -= $weight;
      }
    }
  }
  $end = microtime(true);

  print "random element is: $elem\n";
  print "total time is " . ($end - $start) . "\n";

With sample output:

# php wr2.php
weighted sum is 2751550
random value is 345713
random element is: 681
total time is 0.017189025878906

measurement is hardly scientific - and fluctuates depending on where in the array the element falls (obviously) but it seems fast enough for huge datasets.

like image 154
pala_ Avatar answered Feb 13 '23 04:02

pala_


This way requires two random calculations but they should be faster and require about 1/4 of the memory but with some reduced accuracy if weights have disproportionate counts. (See Update for increased accuracy at the cost of some memory and processing)

Store a multidimensional array where each item is stored in the an array based on its weight:

$array[$weight][] = $item;
// example: Item with a weight of 5 would be $array[5][] = 'Item'

Generate a new array with the weights (1-10) appearing n times for n weight:

foreach($array as $n=>$null) {
  for ($i=1;$i<=$n;$i++) {
    $weights[] = $n;
  }
}

The above array would be something like: [ 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 ... ]

First calculation: Get a random weight from the weighted array we just created

$weight = $weights[mt_rand(0, count($weights)-1)];

Second calculation: Get a random key from that weight array

$value = $array[$weight][mt_rand(0, count($array[$weight])-1)];

Why this works: You solve the weighted issue by using the weighted array of integers we created. Then you select randomly from that weighted group.


Update: Because of the possibility of disproportionate counts of items per weight, you could add another loop and array for the counts to increase accuracy.

foreach($array as $n=>$null) {
  $counts[$n] = count($array[$n]);
}

foreach($array as $n=>$null) {
  // Calculate proportionate weight (number of items in this weight opposed to minimum counted weight)
  $proportion = $n * ($counts[$n] / min($counts));
  for ($i=1; $i<=$proportion; $i++) {
    $weights[] = $n;
  }
}

What this does is if you have 2000 10's and 100 1's, it'll add 200 10's (20 * 10, 20 because it has 20x the count, and 10 because it is weighted 10) instead of 10 10's to make it proportionate to how many are in there opposed the minimum weight count. So to be accurate, instead of adding one for EVERY possible key, you are just being proportionate based on the MINIMUM count of weights.

like image 25
Devon Avatar answered Feb 13 '23 03:02

Devon