Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using .txt file containing random numbers with the diehard test suite

Tags:

random

testing

I have several .txt files containing a large number of integers (approx 2.5 million) generated by various RNG's. I want to use the diehard test suite to test these RNG's.

The .txt files look like this:

#==============================================
# generator Park       seed = 1
#=============================================
type: d
count: 2500000
numbit: 32
16807
282475249

Followed by ofcourse, more integers. I use the following command to run diehard with this .txt file

dieharder -f randdata.txt -a - g 202

My question is, is my .txt file correct (in particular the first few lines), and why are these lines neccesary? The reason I am asking this is because every .txt file generated by some RNG (some good, some bad) fails almost every test and I am wondering if this is because of some mistake I made in passing the .txt file to diehard or if my RNG's are just all bad.

like image 255
user111199 Avatar asked Oct 05 '15 17:10

user111199


1 Answers

Yep, that input file looks correct. It seems like a number of the dieharder tests fail even with 10M inputs generated by dieharder's own generator:

$ dieharder -o -f example.input -t 10000000 # Generate an input file
$ head -n 10 example.input
#==================================================================
# generator mt19937  seed = 3423143424
#==================================================================
type: d
count: 10000000
numbit: 32
2310531048
 808929469
2423056114
4237891648
$ dieharder -a -g 202 -f example.input 
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |           filename             |rands/second|
     file_input|                   example.input|  2.50e+06  |
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
# The file file_input was rewound 1 times
   diehard_birthdays|   0|       100|     100|0.07531570|  PASSED  
# The file file_input was rewound 11 times
      diehard_operm5|   0|   1000000|     100|0.00000000|  FAILED  
# The file file_input was rewound 24 times
  diehard_rank_32x32|   0|     40000|     100|0.00047786|   WEAK   
# The file file_input was rewound 30 times
    diehard_rank_6x8|   0|    100000|     100|0.38082242|  PASSED  
# The file file_input was rewound 32 times
   diehard_bitstream|   0|   2097152|     100|0.56232583|  PASSED  
# The file file_input was rewound 53 times
        diehard_opso|   0|   2097152|     100|0.83072458|  PASSED  

I don't know exactly how many samples you'll need to get "better" results… but failures with only 2.5M numbers seem like they are to be expected.

After some experimentation, though, it seems like the tests start passing with ~120MB of random binary data:

$ dd if=/dev/urandom of=/tmp/random bs=4096 count=30000
30000+0 records in
30000+0 records out
122880000 bytes transferred in 10.873818 secs (11300538 bytes/sec)
$ du -sh /tmp/random
117M    /tmp/random
$ dieharder -a -g 201 -f /tmp/random
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |           filename             |rands/second|
 file_input_raw|                     /tmp/random|  1.11e+07  |
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.71230346|  PASSED  
# The file file_input_raw was rewound 3 times
      diehard_operm5|   0|   1000000|     100|0.62093817|  PASSED  
# The file file_input_raw was rewound 7 times
  diehard_rank_32x32|   0|     40000|     100|0.02228171|  PASSED  
# The file file_input_raw was rewound 9 times
    diehard_rank_6x8|   0|    100000|     100|0.20698623|  PASSED  
# The file file_input_raw was rewound 10 times
   diehard_bitstream|   0|   2097152|     100|0.55567887|  PASSED  
# The file file_input_raw was rewound 17 times
        diehard_opso|   0|   2097152|     100|0.20799917|  PASSED  

Which corresponds to 122,880,000 / 4 = 30,720,000 - so about 31M integers.

like image 179
David Wolever Avatar answered Oct 20 '22 08:10

David Wolever