Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between skew and kurtosis functions in pandas vs. scipy?

I decided to compare skew and kurtosis functions in pandas and scipy.stats, and don't understand why I'm getting different results between libraries.

As far as I can tell from the documentation, both kurtosis functions compute using Fisher's definition, whereas for skew there doesn't seem to be enough of a description to tell if there any major differences with how they are computed.

import pandas as pd
import scipy.stats.stats as st

heights = np.array([1.46, 1.79, 2.01, 1.75, 1.56, 1.69, 1.88, 1.76, 1.88, 1.78])

print "skewness:", st.skew(heights)
print "kurtosis:", st.kurtosis(heights)

this returns:

skewness: -0.393524456473
kurtosis: -0.330672097724

whereas if I convert to a pandas dataframe:

heights_df = pd.DataFrame(heights)
print "skewness:", heights_df.skew()
print "kurtosis:", heights_df.kurtosis() 

this returns:

skewness: 0   -0.466663
kurtosis: 0    0.379705

Apologies if I've posted this in the wrong place; not sure if it's a stats or a programming question.

like image 727
lin_bug Avatar asked Oct 13 '15 17:10

lin_bug


People also ask

What is skewness and kurtosis in Python?

Skewness is a statistical measure of asymmetric distribution of data while kurtosis helps determine if the distribution is heavy-tailed compared to a normal distribution. The most common type of data and probability distribution is a normal distribution.

What does pandas skew do?

Pandas DataFrame skew() Method The skew() method calculates the skew for each column. By specifying the column axis ( axis='columns' ), the skew() method searches column-wise and returns the skew of each row.

How do pandas find kurtosis?

The pandas DataFrame has a computing method kurtosis() which computes the kurtosis for a set of values across a specific axis (i.e., a row or a column). The pandas library function kurtosis() computes the Fisher's Kurtosis which is obtained by subtracting the Pearson's Kurtosis by three.


1 Answers

The difference is due to different normalizations. Scipy by default does not correct for bias, whereas pandas does.

You can tell scipy to correct for bias by passing the bias=False argument:

>>> x = pandas.Series(np.random.randn(10))
>>> stats.skew(x)
-0.17644348972413657
>>> x.skew()
-0.20923623968879457
>>> stats.skew(x, bias=False)
-0.2092362396887948
>>> stats.kurtosis(x)
0.6362620964462327
>>> x.kurtosis()
2.0891062062174464
>>> stats.kurtosis(x, bias=False)
2.089106206217446

There does not appear to be a way to tell pandas to remove the bias correction.

like image 147
BrenBarn Avatar answered Oct 20 '22 15:10

BrenBarn