I decided to compare skew and kurtosis functions in pandas and scipy.stats, and don't understand why I'm getting different results between libraries.
As far as I can tell from the documentation, both kurtosis functions compute using Fisher's definition, whereas for skew there doesn't seem to be enough of a description to tell if there any major differences with how they are computed.
import pandas as pd
import scipy.stats.stats as st
heights = np.array([1.46, 1.79, 2.01, 1.75, 1.56, 1.69, 1.88, 1.76, 1.88, 1.78])
print "skewness:", st.skew(heights)
print "kurtosis:", st.kurtosis(heights)
this returns:
skewness: -0.393524456473
kurtosis: -0.330672097724
whereas if I convert to a pandas dataframe:
heights_df = pd.DataFrame(heights)
print "skewness:", heights_df.skew()
print "kurtosis:", heights_df.kurtosis()
this returns:
skewness: 0 -0.466663
kurtosis: 0 0.379705
Apologies if I've posted this in the wrong place; not sure if it's a stats or a programming question.
Skewness is a statistical measure of asymmetric distribution of data while kurtosis helps determine if the distribution is heavy-tailed compared to a normal distribution. The most common type of data and probability distribution is a normal distribution.
Pandas DataFrame skew() Method The skew() method calculates the skew for each column. By specifying the column axis ( axis='columns' ), the skew() method searches column-wise and returns the skew of each row.
The pandas DataFrame has a computing method kurtosis() which computes the kurtosis for a set of values across a specific axis (i.e., a row or a column). The pandas library function kurtosis() computes the Fisher's Kurtosis which is obtained by subtracting the Pearson's Kurtosis by three.
The difference is due to different normalizations. Scipy by default does not correct for bias, whereas pandas does.
You can tell scipy to correct for bias by passing the bias=False
argument:
>>> x = pandas.Series(np.random.randn(10))
>>> stats.skew(x)
-0.17644348972413657
>>> x.skew()
-0.20923623968879457
>>> stats.skew(x, bias=False)
-0.2092362396887948
>>> stats.kurtosis(x)
0.6362620964462327
>>> x.kurtosis()
2.0891062062174464
>>> stats.kurtosis(x, bias=False)
2.089106206217446
There does not appear to be a way to tell pandas to remove the bias correction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With