Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating random numbers under very specific constraints

I am faced with the following programming problem. I need to generate n (a, b) tuples for which the sum of all a's is a given A and sum of all b's is a given B and for each tuple the ratio of a / b is in the range (c_min, c_max). A / B is within the same range, too. I am also trying to make sure there is no bias in the result other than what is introduced by the constraints and the a / b values are more-or-less uniformly distributed in the given range.

Some clarifications and meta-constraints:

  • A, B, c_min, and c_max are given.
  • The ratio A / B is in the (c_min, c_max) range. This has to be so if the problem is to have a solution given the other constraints.
  • a and b are >0 and non-integer.

I am trying to implement this in Python but ideas in any language (English included) are much appreciated.

like image 855
ktdrv Avatar asked Oct 26 '11 20:10

ktdrv


People also ask

How do you generate random numbers in a certain range?

Method 1: Using Math. random() function is used to return a floating-point pseudo-random number between range [0,1) , 0 (inclusive) and 1 (exclusive). This random number can then be scaled according to the desired range.

How do you generate a random number of a specific length in C++?

One way to generate these numbers in C++ is to use the function rand(). Rand is defined as: #include <cstdlib> int rand(); The rand function takes no arguments and returns an integer that is a pseudo-random number between 0 and RAND_MAX.

How are random values generated?

A true random number generator (TRNG), also known as a hardware random number generator (HRNG), does not use a computer algorithm. Instead, it uses an external unpredictable physical variable such as radioactive decay of isotopes or airwave static to generate random numbers.


1 Answers

Lots of good ideas here. Thanks! Rossum's idea seemed the most straightforward implementation-wise so I went for it. Here is the code for posterity:

c_min = 0.25
c_max = 0.75
a_sum = 100.0
b_sum = 200.0
n = 1000 

a = [a_sum / n] * n
b = [b_sum / n] * n

while not good_enough(a, b):
    i, j = random.sample(range(n), 2)
    li, ui = c_min * b[i] - a[i], c_max * b[i] - a[i]
    lj, uj = a[j] - c_min * b[j], a[j] - c_max * b[j]
    llim = max((li, uj))
    ulim = min((ui, lj))
    q = random.uniform(llim, ulim)
    a[i] += q
    a[j] -= q

    i, j = random.sample(range(n), 2)
    li, ui = a[i] / c_max - b[i], a[i] / c_min - b[i]
    lj, uj = b[j] - a[j] / c_max, b[j] - a[j] / c_min
    llim = max((li, uj))
    ulim = min((ui, lj))
    q = random.uniform(llim, ulim)
    b[i] += q
    b[j] -= q

The good_enough(a, b) function can be a lot of things. I tried:

  • Standard deviation, which is hit or miss, as you don't know what is a good enough value.
  • Kurtosis, where a large negative value would be nice. However, it is relatively slow to calculate and is undefined with the seed values of (a_sum / n, b_sum / n) (though that's trivial to fix).
  • Skewness, where a value close to 0 is desirable. But it has the same drawbacks as kurtosis.
  • A number of iterations proportional to n. 2n sometimes wasn't enough, n ^ 2 is a little bit of overkill and is, well, exponential.

Ideally, a heuristic using a combination of skewness and kurtosis would be best but I settled for making sure each value has been changed from the initial (again, as rossum suggested in a comment). Though there is no theoretical guarantee that the loop will complete, it seemed to work well enough for me.

like image 60
ktdrv Avatar answered Oct 11 '22 02:10

ktdrv