Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distribute a large value among a small number of elements

I need to distribute a large integer budget randomly among a small array with n elements, so that all elements in the array will have the same distribution and sum up to budget and each element in the array gets at least min.

I have an algorithm that runs in O(budget):

private int[] distribute(int budget, int n, int min) {
  int[] subBudgets = new int[n];
  for (int i = 0; i < n; i++) {
    subBudgets[i] = min;
  }
  budget -= n * min;
  while (budget > 0) {
    subBudgets[random.nextInt(n)]++;
    budget--;
  }
  return subBudgets;
}

However, when budget increases, it can be very expensive. Is there any algorithm that runs in O(n) or even better?

like image 667
Zebra Propulsion Lab Avatar asked Oct 03 '22 15:10

Zebra Propulsion Lab


2 Answers

First generate n random numbers x[i], sum them up and then divide budget by the sum and you will get k. Then assign k*x[i] to each array element. It is simple and O(n).

If you want there at least min value in each element you can modify above algorithm by filling all elements by min (or use k*x[i] + min) and subcontracting n*min from budget before starting above algorithm.

If you need working with integers you can approach problem by using real value k and rounding k*x[i]. Then you have to track accumulating rounding error and add or subtract accumulated error from calculated value if it reach whole unit. You have to also assign remaining value into last element to reach whole budget.

P.S.: Note this algorithm can be used with easy in pure functional languages. It is reason why I like this whole family of algorithms generating random numbers for each member and then do some processing afterward. Example of implementation in Erlang:

-module(budget).

-export([distribute/2, distribute/3]).

distribute(Budget, N) ->
  distribute(Budget, N, 0).

distribute(Budget, N, Min) when
    is_integer(Budget), is_integer(N), N > 0,
    is_integer(Min), Min >= 0, Budget >= N*Min ->
  Xs = [random:uniform() || _ <- lists:seq(1,N) ],
  Rest = Budget - N*Min,
  K = Rest / lists:sum(Xs),
  F = fun(X, {Bgt, Err, Acc}) ->
      Y = X*K + Err,
      Z = round(Y),
      {Bgt - Z, Y - Z, [Z + Min | Acc]}
  end,
  {Bgt, _, T} = lists:foldl(F, {Rest, 0.0, []}, tl(Xs)),
  [Bgt + Min | T].

Same algorithm in C++ (?? I dunno.)

private int[] distribute(int budget, int n, int min) {
  int[] subBudgets = new int[n];
  double[] rands = new double[n];
  double k, err = 0, sum = 0;
  budget -= n * min;
  for (int i = 0; i < n; i++) {
    rands[i] = random.nextDouble();
    sum += rands[i];
  }
  k = (double)budget/sum;
  for (int i = 1; i < n; i++) {
    double y = k*rands[i] + err;
    int z = floor(y+0.5);
    subBudgets[i] = min + z;
    budget -= z;
    err = y - z;
  }
  subBudgets[0] = min + budget;
  return subBudgets;
}
like image 163
Hynek -Pichi- Vychodil Avatar answered Oct 13 '22 00:10

Hynek -Pichi- Vychodil


Sampling from the Multinomial Distribution

The way that you are currently distributing the dollars left over after min has been given to each subbudget involves performing a fixed number budget of random "trials", where on each trial you randomly select one of n categories, and you want to know how many times each category is selected. This is modeled by a multinomial distribution with the following parameters:

  • Number of trials (called n on the WP page): budget
  • Number of categories (called k on the WP page): n
  • Probability of category i in each trial, for 1 <= i <= n: 1/n

The way you are currently doing it is a good way if the number of trials is around the same size as the number of categories, or less. But if the budget is large, there are other more efficient ways of sampling from this distribution. The easiest way I know of is to notice that a multinomial distribution with k categories can be repeatedly decomposed into binomial distributions by grouping categories together: instead of directly how many selections there are for each of the k categories, we express this as a sequence of questions: "How to split the budget between the first category and the other k-1?" We next ask "How to split the remainder between the second category and the other k-2?", etc.

So the top level binomial has category (subbudget) 1 vs. everything else. Decide the number of dollars that go to subbudget 1 by taking 1 sample from a binomial distribution with parameters n = budget and p = 1/n (how to do this is described here); this will produce some number 0 <= x[1] <= n. To find the number of dollars that go to subbudget 2, take 1 sample from a binomial distribution on the remaining money, i.e. using parameters n = budget - x[1] and p = 1/(n-1). After getting subbudget 2's amount x[2], subbudget 3's will be found using parameters n = budget - x[1] - x[2] and p = 1/(n-2), and so on.

like image 24
j_random_hacker Avatar answered Oct 13 '22 00:10

j_random_hacker