Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate random numbers in a specific range with a standard deviation?

Tags:

java

random

I already know how to generate random numbers in a range. I can do this by using

rand.nextInt((max - min) + 1) + min;

The problem is that I would also like to set a standard deviation for these numbers. The numbers also need to be positive and they are not between 0 and 1

EDIT I removed the ThreadLocalRandom class because I cannot set a seed in that class and these random numbers should be reproducible in a different system.

like image 475
statboy Avatar asked Aug 25 '16 12:08

statboy


1 Answers

Choosing the standard deviation (or variance) for a bounded distribution can only be done subject to constraints that depend on the selected distribution and the bounds (min, max) of your interval. Some distributions may allow you to make the variance arbitrarily small (e.g. the Beta distribution), other distributions (like the Uniform distribution) don't allow any flexibility once the bounds (min, max) have been set. In any case, you'll never be able to make the variance arbitrarily large -- the bounds do prevent this (they'll always enter the expression for the distribution's variance).

I'll illustrate this for a very simple example that can be implemented without requiring any 3rd party libraries. Let's assume you want a symmetric distribution on the interval (min, max), symmetry implying that the mean E(X) of the distribution is located in the middle of the interval: E(X) = (min + max)/2.

Using Random's nextDouble as in x = a + (b - a) * rnd.nextDouble() will give you a uniformly distributed random variable in the interval a <= x < b that has a fixed variance Var(X) = (b - a)^2 / 12 (not what we want).

OTH, simulating a symmetric triangular distribution on the same interval (a, b) would give us a random variate whith the same mean but with only half the variance: Var(X) = (b - a)^2 / 24 (also fixed, so also not what we want).

A symmetric trapezoidal distribution with parameters (a < b < c < d) lies somewhere in the middle of a Uniform and a triangular distribution on the interval (a, d). The symmetry condition implies d - c = b - a, in the following I'll refer to the distance b - a as x or as "displacement" (I've made up that name, it's not a technical term).

If you let x approach 0.0 from above, the trapezoidal will begin to look very similar to a uniform distribution and its variance will tend to the maximum possible value (d - a)^2 / 12. If you let x approach the maximum possible value (d - a)/2 from below, the trapezoidal will look very similar to a symmetric triangle distribution and its variance will approach the minimum possible value of (d - a)^2 / 24) (but note that we should stay away a little from these extreme values in order not to break the variance formula or our algorithm for the trapezoidal).

So, the idea is to construct a trapezoidal distribution with a value for x that yields the standard deviation you want, given the condition that your targeted standard deviation must lie inside the open range (roughly) given by (0.2041(d - a), 0.2886(d - a)). For convenience let's assume that a = min = 2.0 and d = max = 10.0 which gives us this range of possible stddevs: (1.6328, 2.3088). Let's further assume that we want to construct a distribution with a stddev of 2.0 (which, of course, has to be in the admissible range).

Solving this requires 3 steps:

1) we need to have a formula for the variance given min, max and an admissible value for the displacement x

2) we need to somehow "invert" this expression to give us the value of x for our target variance

3) once we know the value of x we must construct a random variable that has a symmetric trapezoidal distribution with the parameters (min, max, x)

Step 1:

/**
 * Variance of a symmetric trapezoidal distribution with parameters
 * {@code a < b < c < d} and the length of {@code d - c = b - a}
 * (by symmetry) identified by {@code x}.
 * 
 * @param a support lower bound
 * @param d support upper bound
 * @param x length of {@code d - c = b - a}, constrained to lie in the open
 *          interval {@code (0, (d-a)/2)}
 * @return variance of the symmetric trapezoidal distribution defined by
 *         the triple {@code (a, d, x)}
 */
static double varSymTrapezoid(double a, double d, double x) {
    if (a <= 0.0 || d <= 0.0 || a >= d) {
        throw new IllegalArgumentException();
    }
    if (x <= 0.0 || x >= (d - a) / 2) {
        throw new IllegalArgumentException();
    }
    double b = a + x;
    double c = d - x;
    double b3 = pow(b, 3);
    double c3 = pow(c, 3);
    double ex2p1 = pow(b, 4) / 4 - a * b3 / 3 + pow(a, 4) / 12;
    double ex2p2 = (c3 / 3 - b3 / 3) * (d - c);
    double ex2p3 = pow(c, 4) / 4 - d * c3 / 3 + pow(d, 4) / 12;
    double ex2 = (ex2p1 + ex2p2 + ex2p3) / ((d - b) * (d - c));
    return ex2 - pow((a + d) / 2, 2);
}

Note that this formula is only valid for symmetric trapezoidal distributions. As an example, if you call this method with a displacement of 2.5 (varSymTrapezoid(2.0, 10.0, 2.5)) it'd give you back a variance of approximately 3.0416 which is too low (we need 4.0), meaning that a displacement of 2.5 is too much (higher displacements give lower variances).

The variance expression is a 4th order polynomial in x that I'd rather not want to solve analytically. However, for a target x in the admissible range this expression is monotonically decreasing, so we can construct a function that crosses zero for our target variance and solve this by simple bisection. This is

Step 2:

/**
 * Find the displacement {@code x} for the given {@code stddev} by simple
 * bisection.
 * @param min support lower bound
 * @param max support upper bound
 * @param stddev the standard deviation we want
 * @return the length {@code x} of {@code d - c = b - a} that yields a
 * standard deviation roughly equal to {@code stddev}
 */
static double bisect(double min, double max, double stddev) {
    final double eps = 1e-4;
    final double var = pow(stddev, 2);
    int iters = 0;
    double a = eps;
    double b = (max - min) / 2 - eps;
    double x = eps;
    double dx = b - a;

    while (abs(dx) > eps && iters < 150 && eval(min, max, x, var) != 0.0) {
        x = ((a + b) / 2);
        if ((eval(min, max, a, var) * eval(min, max, x, var)) < 0.0) {
            b = x;
            dx = b - a;
        } else {
            a = x;
            dx = b - a;
        }
        iters++;
    }
    if (abs(eval(min, max, x, var)) > eps) {
        throw new RuntimeException("failed to find solution");
    }
    return x;
}

/**
 * Function whose root we want to find.
 */
static double eval(double min, double max, double x, double var) {
    return varSymTrapezoid(min, max, x) - var;
}

Calling the bisect method with the desired value 2.0 for the standard deviation (bisect(2.0, 10.0, 2.0)) gives us the needed displacement: ~ 1.1716. Now that the value for x is known, the final thing we have to do is to construct a suitably distributed random variable which is

Step 3:

It is a well-known fact of probability theory that the sum of two independent uniformly distributed random variables X1 ~ U[a1, b1] and X2 ~ U[a2, b2] is a symmetric trapezoidally distributed random variable on the interval [a1 + a2, b1 + b2] provided that either a1 + b2 < a2 + b1 (case 1) or a2 + b1 < a1 + b2 (case 2). We have to avoid the case a2 + b1 = a1 + b2 (case 3) since then the sum has a symmetric triangular distribution which we don't want.

We'll choose case 1 (a1 + b2 < a2 + b1). In that case the length of b2 - a2 will be equal to the "displacement" x.

So, all we have to do is to choose the interval boundaries a1, a2, b1 and b2 such that a1 + a2 = min, b1 + b2 = max, b2 - a2 = x and the above inequality is fullfilled:

/**
 * Return a pseudorandom double for the symmetric trapezoidal distribution
 * defined by the triple {@code (min, max, x)}
 * @param min support lower bound
 * @param max support upper bound
 * @param x length of {@code max - c = b - min}, constrained to lie in the
 *          open interval {@code (0, (max-min)/2)}
 */
public static double symTrapezoidRandom(double min, double max, double x) {
    final double a1 = 0.5 * min;
    final double a2 = a1;

    final double b1 = max - a2 - x;
    final double b2 = a2 + x;

    if ((a1 + b2) >= (a2 + b1)) {
        throw new IllegalArgumentException();
    }

    double u = a1 + (b1 - a1) * rnd.nextDouble();
    double v = a2 + (b2 - a2) * rnd.nextDouble();
    return u + v;
}

Calling symTrapezoidRandom(2.0, 10.0, 1.1716) repeatedly gives you random variables that have the desired distribution.

You could do very similar things with other, more sophisticated, distributions like the Beta. This would give you other (usually more flexible) bounds on the admissible variances but you'd need a 3rd party library like commons.math for that.

abs, pow, sqrt in the code refer to the statically imported java.lang.Math methods and rnd is an instance of java.util.Random.

like image 82
Stefan Zobel Avatar answered Nov 16 '22 00:11

Stefan Zobel