Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unit testing random numbers java

My goal is to test whether a class sets one of its attributes to a random integer value. I found an algorithm for a chi square test online and decided to put it to use. I'm quite surprised by the result: the larger I make my sample size, the less likely the test seems to be to pass. I should say that I'm by no means a statistics expert (which might be self-evident by my asking this question) so I might be getting some things wrong here.

Results of tests while variating only the final int SIZE (in UserTest). Every test ran 30 times:

SIZE    avg     results

11      25.4    26, 25, 22, 24, 30

20      25      26, 26, 24, 22, 27

30      24      24, 22, 24, 26, 24

100     19.4    17, 23, 20, 18, 19

200     16.2    15, 18, 18, 15, 15

1000    13.2    13, 13, 14, 13, 13

10000   10      14, 7, 8, 10, 11

While it's not strictly necessary for me to have true randomness in this context, I'm still curious as to what the problem is. Is this a faulty algorithm in itself, my faulty use of it, a natural consequence of "making the test harder" (statistics noob, remember), or am I pushing up against the boundaries of Java's pseudorandom generator ?

the domain class:

public class User
{
  public static final int MINIT = 20;
  public static final int MAXIT = 50;
  private int iterations;

  public void setIterations()
  {
    Random random = new Random();
    setIterations(MINIT+random.nextInt(MAXIT-MINIT));
  }

  private void setIterations(int iterations) {
    this.iterations = iterations;
  }
}

the test class:

public class UserTest {

  private User user = new User();

  @Test
  public void testRandomNumbers() {
    int results = 0;
    final int TIMES = 30;
    for(int i = 0; i < TIMES; i++)
    {
      if (randomNumbersRun())
      {
        results++;
      }
    }
    System.out.println(results);
    Assert.assertTrue(results >= TIMES * 80 / 100);
  }

  private boolean randomNumbersRun()
  {
    ArrayList<Integer> list = new ArrayList<Integer>();
    int r = User.MAXIT - User.MINIT;
    final int SIZE = 11;
    for (int i = 0; i < r*SIZE; i++) {
      user.setIterations();
      list.add(user.getIterations());
    }
    return Statistics.isRandom(list, r);
  }
}

The chi square algorithm:

  /**
   * source: http://en.wikibooks.org/wiki/Algorithm_Implementation/Pseudorandom_Numbers/Chi-Square_Test
   * changed parameter to ArrayList<Number> for generalization
   */
  public static boolean isRandom(ArrayList<? extends Number> randomNums, int r) {
    //According to Sedgewick: "This is valid if N is greater than about 10r"
    if (randomNums.size() <= 10 * r) {
      return false;
    }

    //PART A: Get frequency of randoms
    Map<Number, Integer> ht = getFrequencies(randomNums);

    //PART B: Calculate chi-square - this approach is in Sedgewick
    double n_r = (double) randomNums.size() / r;
    double chiSquare = 0;

    for (int v : ht.values()) {
      double f = v - n_r;
      chiSquare += f * f;
    }
    chiSquare /= n_r;

    //PART C: According to Swdgewick: "The statistic should be within 2(r)^1/2 of r
    //This is valid if N is greater than about 10r"
    return Math.abs(chiSquare - r) <= 2 * Math.sqrt(r);
  }

  /**
   * @param nums an array of integers
   * @return a Map, key being the number and value its frequency
   */
  private static Map<Number, Integer> getFrequencies(ArrayList<? extends Number> nums) {
    Map<Number, Integer> freqs = new HashMap<Number, Integer>();

    for (Number x : nums) {
      if (freqs.containsKey(x)) {
        freqs.put(x, freqs.get(x) + 1);
      } else {
        freqs.put(x, 1);
      }
    }

    return freqs;
  }
}
like image 912
blagae Avatar asked Oct 01 '22 20:10

blagae


1 Answers

It looks like your test has smoked out an error in the implementation of Random on your platform. According to the documentation, invoking new Random() without a parameter should

set the seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor.

In practice this is implemented by adding a "randomizing coefficient" to the current time in nanoseconds, followed by changing the randomizing coefficient to a new value based on some simple algorithm. However, the exact implementation is specific to your Java platform.

Since you obtain only the first pseudorandom number before discarding your Random object, your test is dependent on the outcome of a different random algorithm: rather than testing how good is the Random itself, your algorithm is effectively testing how good is its random seed selector.

Since making Random a static member of the User class has fixed the problem, and because the outcome has been different on another platform (link to code on ideone showing different results) it appears that

  • The algorithm for producing pseudorandom numbers on your platform is better than the algorithm for producing seeds for the pseudorandom number generator on your platform
  • The above is specific to your platform, because on other platforms the seed generator is also reasonably random.
like image 178
Sergey Kalinichenko Avatar answered Oct 06 '22 20:10

Sergey Kalinichenko