How can check the distribution of a variable in python? [closed]

1 Answers

You can use Kolmogorove-Smirnov Test for continues and discrete distributions. This function is provided with scipy.stats.kstest http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kstest.html#scipy.stats.kstest.

Click to copy

In [12]:

import scipy.stats as ss
import numpy as np
In [14]:

A=np.random.randint(0,10,100)
In [16]:

ss.kstest(A, ss.randint.cdf, args=(0,10))
#args is a tuple containing the extra parameter required by ss.randint.cdf, in this case, lower bound and upper bound
Out[16]:
(0.12, 0.10331653831438881)
#This a tuple of two values; KS test statistic, either D, D+ or D-. and p-value

Here the resulting P value is 0.1033, we therefore conclude that the array A is not significantly different from a uniform distribution. The way to think about the P value is, it measures the probability of getting the test statistic as extreme as the one observed (here: the first number in the tuple) assuming the null hypothesis is true. In KS test, we actually has the null hypothesis that A is not different from a uniform distribution. A p value of 0.1033 is often not considered as extreme enough to reject the null hypothesis. Usually the P value has to be less than 0.05 or 0.01 in order to reject the null. If this p value in this example is less than 0.05, we will then say A is significantly different from a uniform distribution.

The alternative method of using scipy.stats.chisquare():

Click to copy

In [17]:

import scipy.stats as ss
import numpy as np
In [18]:

A=np.random.randint(0, 10, 100)
In [19]:

FRQ=(A==np.arange(10)[...,np.newaxis]).sum(axis=1)*1./A.size #generate the expect frequecy table.
In [20]:

ss.chisquare(FRQ) #If not specified, the default expected frequency is uniform across categories.
Out[20]:
(0.084000000000000019, 0.99999998822800984)

The first value is chisquare and the second value is P value.

answered Oct 06 '22 00:10

CT Zhu

Related questions
                            
                                Diff on pandas dataframe with more than one column
                            
                                Large scale server application using DDD with Python?
                            
                                NumPy PolyFit and PolyVal in Multiple Dimensions?
                            
                                Scaling issues with scipy.sparse matrix while using scikit
                            
                                Calculate overlap area of two functions
                            
                                How can I determine if a non-blocking socket is really connected?
                            
                                How to specify the endiannes directly in the numpy datatype for a 16bit unsigned integer?
                            
                                Defining Indexes in SqlAlchemy with Alembic
                            
                                Const correctness of Python's C API
                            
                                Quit Python program when it hits memory limit
                            
                                How to get Boolean values from request.POST dict
                            
                                matplotlib: multiple plots on one figure
                            
                                QLineEdit is not updating with setText
                            
                                Is it considered bad practice to re-use an iterating variable multiple times in a given script? [closed]
                            
                                Python multiprocessing with an updating queue and an output queue
                            
                                Pandas: Counting unique values in a dataframe
                            
                                Sieve Of Atkin Implementation in Python
                            
                                check if dataframe is of boolean type pandas
                            
                                Python negative subscripting
                            
                                Solving system of nonlinear equations with python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can check the distribution of a variable in python? [closed]

Tags:

python

arrays

random

numpy

statistics

eduardo.sufan

People also ask

1 Answers

CT Zhu

Recent Activity

Donate For Us