How to correctly use scipy's skew and kurtosis functions?

Tags:

The skewness is a parameter to measure the symmetry of a data set and the kurtosis to measure how heavy its tails are compared to a normal distribution, see for example here.

scipy.stats provides an easy way to calculate these two quantities, see scipy.stats.kurtosis and scipy.stats.skew.

In my understanding, the skewness and kurtosis of a normal distribution should both be 0 using the functions just mentioned. That is, however, not the case with my code:

import numpy as np from scipy.stats import kurtosis from scipy.stats import skew  x = np.linspace( -5, 5, 1000 ) y = 1./(np.sqrt(2.*np.pi)) * np.exp( -.5*(x)**2  )  # normal distribution  print( 'excess kurtosis of normal distribution (should be 0): {}'.format( kurtosis(y) )) print( 'skewness of normal distribution (should be 0): {}'.format( skew(y) ))

The output is:

excess kurtosis of normal distribution (should be 0): -0.307393087742

skewness of normal distribution (should be 0): 1.11082371392

What am I doing wrong ?

The versions I am using are

python: 2.7.6 scipy : 0.17.1 numpy : 1.12.1

889

asked Aug 03 '17 12:08

Alf

2 Answers

These functions calculate moments of the probability density distribution (that's why it takes only one parameter) and doesn't care about the "functional form" of the values.

These are meant for "random datasets" (think of them as measures like mean, standard deviation, variance):

import numpy as np from scipy.stats import kurtosis, skew  x = np.random.normal(0, 2, 10000)   # create random values based on a normal distribution  print( 'excess kurtosis of normal distribution (should be 0): {}'.format( kurtosis(x) )) print( 'skewness of normal distribution (should be 0): {}'.format( skew(x) ))

which gives:

excess kurtosis of normal distribution (should be 0): -0.024291887786943356 skewness of normal distribution (should be 0): 0.009666157036010928

changing the number of random values increases the accuracy:

x = np.random.normal(0, 2, 10000000)

Leading to:

excess kurtosis of normal distribution (should be 0): -0.00010309478605163847 skewness of normal distribution (should be 0): -0.0006751744848755031

In your case the function "assumes" that each value has the same "probability" (because the values are equally distributed and each value occurs only once) so from the point of view of skew and kurtosis it's dealing with a non-gaussian probability density (not sure what exactly this is) which explains why the resulting values aren't even close to 0:

import numpy as np from scipy.stats import kurtosis, skew  x_random = np.random.normal(0, 2, 10000)  x = np.linspace( -5, 5, 10000 ) y = 1./(np.sqrt(2.*np.pi)) * np.exp( -.5*(x)**2  )  # normal distribution  import matplotlib.pyplot as plt  f, (ax1, ax2) = plt.subplots(1, 2) ax1.hist(x_random, bins='auto') ax1.set_title('probability density (random)') ax2.hist(y, bins='auto') ax2.set_title('(your dataset)') plt.tight_layout()

enter image description here

answered Sep 22 '22 10:09

MSeifert

You are using as data the "shape" of the density function. These functions are meant to be used with data sampled from a distribution. If you sample from the distribution, you will obtain sample statistics that will approach the correct value as you increase the sample size. To plot the data, I would recommend a histogram.

%matplotlib inline import numpy as np import pandas as pd from scipy.stats import kurtosis from scipy.stats import skew  import matplotlib.pyplot as plt  plt.style.use('ggplot')  data = np.random.normal(0, 1, 10000000) np.var(data)  plt.hist(data, bins=60)  print("mean : ", np.mean(data)) print("var  : ", np.var(data)) print("skew : ",skew(data)) print("kurt : ",kurtosis(data))

Output:

mean :  0.000410213500847 var  :  0.999827716979 skew :  0.00012294118186476907 kurt :  0.0033554829466604374

enter image description here

Unless you are dealing with an analytical expression, it is extremely unlikely that you will obtain a zero when using data.

answered Sep 23 '22 10:09

Juan Leni

Related questions
                            
                                Setting SQLAlchemy autoincrement start value
                            
                                How to exclude mock package from python coverage report using nosetests
                            
                                Topic distribution: How do we see which document belong to which topic after doing LDA in python
                            
                                How to make nosetests use python3
                            
                                Matplotlib automatic legend outside plot [duplicate]
                            
                                Export Pandas DataFrame into a PDF file using Python
                            
                                Passing a tuple as command line argument
                            
                                Find out if/which BLAS library is used by Numpy
                            
                                Show training and validation accuracy in TensorFlow using same graph
                            
                                Using statsmodel estimations with scikit-learn cross validation, is it possible?
                            
                                Matplotlib: how to adjust space between legend markers and labels?
                            
                                Difference between cross_val_score and cross_val_predict
                            
                                How to exclude a character from a regex group?
                            
                                Search and remove element with elementTree in Python
                            
                                Flask app raises a 500 error with no exception
                            
                                Best way of using google translation by Python [closed]
                            
                                Misunderstanding of python os.path.abspath
                            
                                How to send zip files in the python Flask framework?
                            
                                What is the purpose of $HOME/.local
                            
                                A + B without arithmetic operators, Python vs C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to correctly use scipy's skew and kurtosis functions?

Tags:

python

numpy

statistics

scipy

Alf

People also ask

2 Answers

MSeifert

Juan Leni

Recent Activity

Donate For Us