In a uni-testing I need to check the distribution of the values of an array is uniform. For example:
in an array = [1, 0, 1, 0, 1, 1, 0, 0]
there is a uniform distribution of values. Since there are four "1" and four "0"
For larger lengths of the array, the distribution is more "uniform"
How do I prove that the array that is testing has a uniform distribution?
note: the array is created with random.randint(min,max,len)
, from numpy.random
How to visualize data distribution of a categorical variable in Python. Bar charts can be used in many ways, one of the common use is to visualize the data distribution of categorical variables in data. X-axis being the unique category values and Y-axis being the frequency of each value.
for continuous distributions there is Kolmogorov–Smirnov test; for discrete distributions there is a Chi-square test – behzad.nouri Mar 13 '14 at 23:17 4 @jonrsharpe, I don't agree. The question is about how to do it in Python.
in an array = [1, 0, 1, 0, 1, 1, 0, 0]there is a uniform distribution of values. Since there are four "1" and four "0"
in an array = [1, 0, 1, 0, 1, 1, 0, 0]there is a uniform distribution of values. Since there are four "1" and four "0" For larger lengths of the array, the distribution is more "uniform" How do I prove that the array that is testing has a uniform distribution?
You can use Kolmogorove-Smirnov Test for continues and discrete distributions. This function is provided with scipy.stats.kstest
http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html#scipy.stats.kstest.
In [12]:
import scipy.stats as ss
import numpy as np
In [14]:
A=np.random.randint(0,10,100)
In [16]:
ss.kstest(A, ss.randint.cdf, args=(0,10))
#args is a tuple containing the extra parameter required by ss.randint.cdf, in this case, lower bound and upper bound
Out[16]:
(0.12, 0.10331653831438881)
#This a tuple of two values; KS test statistic, either D, D+ or D-. and p-value
Here the resulting P value is 0.1033, we therefore conclude that the array A
is not significantly different from a uniform distribution. The way to think about the P value is, it measures the probability of getting the test statistic as extreme as the one observed (here: the first number in the tuple) assuming the null hypothesis is true. In KS test, we actually has the null hypothesis that A
is not different from a uniform distribution. A p value of 0.1033 is often not considered as extreme enough to reject the null hypothesis. Usually the P value has to be less than 0.05 or 0.01 in order to reject the null. If this p value in this example is less than 0.05, we will then say A
is significantly different from a uniform distribution.
The alternative method of using scipy.stats.chisquare()
:
In [17]:
import scipy.stats as ss
import numpy as np
In [18]:
A=np.random.randint(0, 10, 100)
In [19]:
FRQ=(A==np.arange(10)[...,np.newaxis]).sum(axis=1)*1./A.size #generate the expect frequecy table.
In [20]:
ss.chisquare(FRQ) #If not specified, the default expected frequency is uniform across categories.
Out[20]:
(0.084000000000000019, 0.99999998822800984)
The first value is chisquare and the second value is P value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With