Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find skewness and kurtosis correctly in pandas?

I was wondering how to calculate skewness and kurtosis correctly in pandas. Pandas gives some values for skew() and kurtosis() values but they seem much different from scipy.stats values. Which one to trust pandas or scipy.stats?

Here is my code:

import numpy as np
import scipy.stats as stats
import pandas as pd

np.random.seed(100)
x = np.random.normal(size=(20))

kurtosis_scipy = stats.kurtosis(x)
kurtosis_pandas = pd.DataFrame(x).kurtosis()[0]

print(kurtosis_scipy, kurtosis_pandas)
# -0.5270409758168872
# -0.31467107631025604

skew_scipy = stats.skew(x)
skew_pandas = pd.DataFrame(x).skew()[0]

print(skew_scipy, skew_pandas)
# -0.41070929017558555
# -0.44478877631598901

Versions:

print(np.__version__, pd.__version__, scipy.__version__)
1.11.0 0.20.0 0.19.0
like image 674
BhishanPoudel Avatar asked Jun 25 '19 16:06

BhishanPoudel


People also ask

How do you find the skewness and kurtosis of a panda?

The pandas DataFrame has a computing method kurtosis() which computes the kurtosis for a set of values across a specific axis (i.e., a row or a column). Here to analyze Birthweight the skew is -0.1. Observation: If the absolute value of skew<0.5 then very symmetric.

How do you find skewness and kurtosis in Python?

To calculate the sample skewness and sample kurtosis of this dataset, we can use the skew() and kurt() functions from the Scipy Stata librarywith the following syntax: skew(array of values, bias=False) kurt(array of values, bias=False)

How do you check skewness in pandas?

Pandas DataFrame skew() Method The skew() method calculates the skew for each column. By specifying the column axis ( axis='columns' ), the skew() method searches column-wise and returns the skew of each row.

How do you calculate skewness and kurtosis?

For parts (c) and (d), recall that X=a+(b−a)U where U has the uniform distribution on [0,1] (the standard uniform distribution ). Hence it follows from the formulas for skewness and kurtosis under linear transformations that skew(X)=skew(U) and kurt(X)=kurt(U).


2 Answers

bias=False

print(
    stats.kurtosis(x, bias=False), pd.DataFrame(x).kurtosis()[0],
    stats.skew(x, bias=False), pd.DataFrame(x).skew()[0],
    sep='\n'
)

-0.31467107631025515
-0.31467107631025604
-0.4447887763159889
-0.444788776315989
like image 99
piRSquared Avatar answered Oct 17 '22 18:10

piRSquared


Pandas calculate UNBIASED estimator of the population kurtosis. Look at the Wikipedia for formulas: https://www.wikiwand.com/en/Kurtosis

enter image description here

Calculate kurtosis from scratch

import numpy as np
import pandas as pd
import scipy

x = np.array([0, 3, 4, 1, 2, 3, 0, 2, 1, 3, 2, 0,
              2, 2, 3, 2, 5, 2, 3, 999])
xbar = np.mean(x)
n = x.size
k2 = x.var(ddof=1) # default numpy is biased, ddof = 0
sum_term = ((x-xbar)**4).sum()
factor = (n+1) * n / (n-1) / (n-2) / (n-3)
second = - 3 * (n-1) * (n-1) / (n-2) / (n-3)

first = factor * sum_term / k2 / k2

G2 = first + second
G2 # 19.998428728659768

Calculate kurtosis using numpy/scipy

scipy.stats.kurtosis(x,bias=False) # 19.998428728659757

Calculate kurtosis using pandas

pd.DataFrame(x).kurtosis() # 19.998429

Similarly, you can also calculate skewness.

like image 43
BhishanPoudel Avatar answered Oct 17 '22 18:10

BhishanPoudel