My goal is to test whether a class sets one of its attributes to a random integer value. I found an algorithm for a chi square test online and decided to put it to use. I'm quite surprised by the result: the larger I make my sample size, the less likely the test seems to be to pass. I should say that I'm by no means a statistics expert (which might be self-evident by my asking this question) so I might be getting some things wrong here.
Results of tests while variating only the final int SIZE
(in UserTest
). Every test ran 30 times:
SIZE avg results
11 25.4 26, 25, 22, 24, 30
20 25 26, 26, 24, 22, 27
30 24 24, 22, 24, 26, 24
100 19.4 17, 23, 20, 18, 19
200 16.2 15, 18, 18, 15, 15
1000 13.2 13, 13, 14, 13, 13
10000 10 14, 7, 8, 10, 11
While it's not strictly necessary for me to have true randomness in this context, I'm still curious as to what the problem is. Is this a faulty algorithm in itself, my faulty use of it, a natural consequence of "making the test harder" (statistics noob, remember), or am I pushing up against the boundaries of Java's pseudorandom generator ?
the domain class:
public class User
{
public static final int MINIT = 20;
public static final int MAXIT = 50;
private int iterations;
public void setIterations()
{
Random random = new Random();
setIterations(MINIT+random.nextInt(MAXIT-MINIT));
}
private void setIterations(int iterations) {
this.iterations = iterations;
}
}
the test class:
public class UserTest {
private User user = new User();
@Test
public void testRandomNumbers() {
int results = 0;
final int TIMES = 30;
for(int i = 0; i < TIMES; i++)
{
if (randomNumbersRun())
{
results++;
}
}
System.out.println(results);
Assert.assertTrue(results >= TIMES * 80 / 100);
}
private boolean randomNumbersRun()
{
ArrayList<Integer> list = new ArrayList<Integer>();
int r = User.MAXIT - User.MINIT;
final int SIZE = 11;
for (int i = 0; i < r*SIZE; i++) {
user.setIterations();
list.add(user.getIterations());
}
return Statistics.isRandom(list, r);
}
}
The chi square algorithm:
/**
* source: http://en.wikibooks.org/wiki/Algorithm_Implementation/Pseudorandom_Numbers/Chi-Square_Test
* changed parameter to ArrayList<Number> for generalization
*/
public static boolean isRandom(ArrayList<? extends Number> randomNums, int r) {
//According to Sedgewick: "This is valid if N is greater than about 10r"
if (randomNums.size() <= 10 * r) {
return false;
}
//PART A: Get frequency of randoms
Map<Number, Integer> ht = getFrequencies(randomNums);
//PART B: Calculate chi-square - this approach is in Sedgewick
double n_r = (double) randomNums.size() / r;
double chiSquare = 0;
for (int v : ht.values()) {
double f = v - n_r;
chiSquare += f * f;
}
chiSquare /= n_r;
//PART C: According to Swdgewick: "The statistic should be within 2(r)^1/2 of r
//This is valid if N is greater than about 10r"
return Math.abs(chiSquare - r) <= 2 * Math.sqrt(r);
}
/**
* @param nums an array of integers
* @return a Map, key being the number and value its frequency
*/
private static Map<Number, Integer> getFrequencies(ArrayList<? extends Number> nums) {
Map<Number, Integer> freqs = new HashMap<Number, Integer>();
for (Number x : nums) {
if (freqs.containsKey(x)) {
freqs.put(x, freqs.get(x) + 1);
} else {
freqs.put(x, 1);
}
}
return freqs;
}
}
It looks like your test has smoked out an error in the implementation of Random
on your platform. According to the documentation, invoking new Random()
without a parameter should
set the seed of the random number generator to a value very likely to be distinct from any other invocation of this constructor.
In practice this is implemented by adding a "randomizing coefficient" to the current time in nanoseconds, followed by changing the randomizing coefficient to a new value based on some simple algorithm. However, the exact implementation is specific to your Java platform.
Since you obtain only the first pseudorandom number before discarding your Random
object, your test is dependent on the outcome of a different random algorithm: rather than testing how good is the Random
itself, your algorithm is effectively testing how good is its random seed selector.
Since making Random
a static
member of the User
class has fixed the problem, and because the outcome has been different on another platform (link to code on ideone showing different results) it appears that
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With