Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A Good and SIMPLE Measure of Randomness

What is the best algorithm to take a long sequence of integers (say 100,000 of them) and return a measurement of how random the sequence is?

The function should return a single result, say 0 if the sequence is not all all random, up to, say 1 if perfectly random. It can give something in-between if the sequence is somewhat random, e.g. 0.95 might be a reasonably random sequence, whereas 0.50 might have some non-random parts and some random parts.

If I were to pass the first 100,000 digits of Pi to the function, it should give a number very close to 1. If I passed the sequence 1, 2, ... 100,000 to it, it should return 0.

This way I can easily take 30 sequences of numbers, identify how random each one is, and return information about their relative randomness.

Is there such an animal?

…..

Update 24-Sep-2019: Google may have just ushered in an era of quantum supremacy says:

"Google’s quantum computer was reportedly able to solve a calculation — proving the randomness of numbers produced by a random number generator — in 3 minutes and 20 seconds that would take the world’s fastest traditional supercomputer, Summit, around 10,000 years. This effectively means that the calculation cannot be performed by a traditional computer, making Google the first to demonstrate quantum supremacy."

So obviously there is an algorithm to "prove" randomness. Does anyone know what it is? Could this algorithm also provide a measure of randomness?

like image 460
lkessler Avatar asked Sep 24 '09 21:09

lkessler


People also ask

What is a measure of randomness?

In this regard, Approximate Entropy (ApEn) is a statistical measure of the level of randomness of a data series which is based on counting patterns and their repetitions. Low levels of this statistic indicate the existence of many repeated patterns, and high values indicate randomness and unpredictability.

Which test is used for randomness?

Runs test is a statistical procedure which determines whether a sequence of data within a given distribution have been derived with a random process or not. It may be applied to test the randomness of data in a survey that collect data from an ordered population.

Is a measure of randomness or uncertainty?

The term entropy is used for this measure of randomness or uncertainty since (a) it has many of the same properties as H in Equation (1) and (b) the entropy term has been used in such a variety of measurement situations for which can similarly be used.

How is randomness used in statistics?

A numeric sequence is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal dice roll or the digits of π exhibit statistical randomness.


2 Answers

Your question answers itself. "If I were to pass the first 100,000 digits of Pi to the function, it should give a number very close to 1", except the digits of Pi are not random numbers so if your algorithm does not recognise a very specific sequence as being non-random then it's not very good.

The problem here is there are many types of non random-ness:- eg. "121,351,991,7898651,12398469018461" or "33,27,99,3000,63,231" or even "14297141600464,14344872783104,819534228736,3490442496" are definitely not random.

I think what you need to do is identify the aspects of randomness that are important to you- distribution, distribution of digits, lack of common factors, the expected number of primes, Fibonacci and other "special" numbers etc. etc.

PS. The Quick and Dirty (and very effective) test of randomness does the file end up roughly the same size after you gzip it.

like image 121
James Anderson Avatar answered Sep 21 '22 03:09

James Anderson


It can be done this way:

CAcert Research Lab does a Random Number Generator Analysis.

Their results page evaluates each random sequence using 7 tests (Entropy, Birthday Spacing, Matrix Ranks, 6x8 Matrix Ranks, Minimum Distance, Random Spheres, and the Squeeze). Each test result is then color coded as one of "No Problems", "Potentially deterministic" and "Not Random".

So a function can be written that accepts a random sequence and does the 7 tests. If any of the 7 tests are "Not Random" then the function returns a 0. If all of the 7 tests are "No Problems", then it returns a 1. Otherwise, it can return some number in-between based on how many tests come in as "Potentially Deterministic".

The only thing missing from this solution is the code for the 7 tests.

like image 24
lkessler Avatar answered Sep 18 '22 03:09

lkessler