Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to calculate coskew and cokurtosis

You can calculate skew and kurtosis with the the methods

  • pd.Series.skew
  • pd.Series.kurt
  • pd.DataFrame.skew
  • pd.DataFrame.kurt

However, there is no convenient way to calculate the coskew or cokurtosis between variables. Or even better, the coskew or cokurtosis matrix.


Consider the pd.DataFrame df

import pandas as pd
import numpy as np

np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(10, 2), columns=list('ab'))

df

          a         b
0  0.444939  0.407554
1  0.460148  0.465239
2  0.462691  0.016545
3  0.850445  0.817744
4  0.777962  0.757983
5  0.934829  0.831104
6  0.879891  0.926879
7  0.721535  0.117642
8  0.145906  0.199844
9  0.437564  0.100702

How do I calculate the coskew and cokurtosis of a and b?

like image 724
piRSquared Avatar asked Jan 27 '17 09:01

piRSquared


1 Answers

References

  • Coskewness
  • Cokurtosis

Calculating coskew

My interpretation of coskew is the "correlation" between one series and the variance of another. As such, you can actually have two types of coskew depending on which series we are calculating the variance of. Wikipedia shows these two formula

'left'
enter image description here
'right'
enter image description here

Fortunately, when we calculate the coskew matrix, one is the transpose of the other.

def coskew(df, bias=False):
    v = df.values
    s1 = sigma = v.std(0, keepdims=True)
    means = v.mean(0, keepdims=True)

    # means is 1 x n (n is number of columns
    # this difference broacasts appropriately
    v1 = v - means

    s2 = sigma ** 2

    v2 = v1 ** 2

    m = v.shape[0]

    skew = pd.DataFrame(v2.T.dot(v1) / s2.T.dot(s1) / m, df.columns, df.columns)

    if not bias:
        skew *= ((m - 1) * m) ** .5 / (m - 2)

    return skew

demonstration

coskew(df)

          a         b
a -0.369380  0.096974
b  0.325311  0.067020

We can compare this to df.skew() and check that the diagonals are the same

df.skew()

a   -0.36938
b    0.06702
dtype: float64

Calculating cokurtosis

My interpretation of cokurtosis is one of two

  1. "correlation" between a series and the skew of another
  2. "correlation" between the variances of two series

For option 1. we again have both a left and right variant that in matrix form are transposes of one another. So, we will only focus on the left variant. That leaves us with calculating a total of two variations.

'left'
enter image description here
'middle'
enter image description here

def cokurt(df, bias=False, fisher=True, variant='middle'):
    v = df.values
    s1 = sigma = v.std(0, keepdims=True)
    means = v.mean(0, keepdims=True)

    # means is 1 x n (n is number of columns
    # this difference broacasts appropriately
    v1 = v - means

    s2 = sigma ** 2
    s3 = sigma ** 3

    v2 = v1 ** 2
    v3 = v1 ** 3

    m = v.shape[0]

    if variant in ['left', 'right']:
        kurt = pd.DataFrame(v3.T.dot(v1) / s3.T.dot(s1) / m, df.columns, df.columns)
        if variant == 'right':
            kurt = kurt.T
    elif variant == 'middle':
        kurt = pd.DataFrame(v2.T.dot(v2) / s2.T.dot(s2) / m, df.columns, df.columns)

    if not bias:
        kurt = kurt * (m ** 2 - 1) / (m - 2) / (m - 3) - 3 * (m - 1) ** 2 / (m - 2) / (m - 3)
    if not fisher:
        kurt += 3

    return kurt

demonstration

cokurt(df, variant='middle', bias=False, fisher=False)

          a        b
a  1.882817  0.86649
b  0.866490  1.63200

cokurt(df, variant='left', bias=False, fisher=False)

          a        b
a  1.882817  0.19175
b -0.020567  1.63200

The diagonal should be equal to kurtosis

df.kurtosis() + 3

a    1.882817
b    1.632000
dtype: float64
like image 197
piRSquared Avatar answered Nov 10 '22 12:11

piRSquared