Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my Kurtosis function not producing the same output as scipy.stats.kurtosis?

I have a homework problem in which I'm supposed to write a function for Kurtosis as descirbed here:

Kurtosis, where theta is the standard deviation

The theta in the denominator is the standard deviation (square-root of the variance) and the x-with-the-bar in the numerator is the mean of x.

I've implemented the function as follows:

import numpy as np
from scipy.stats import kurtosis

testdata = np.array([1, 2, 3, 4, 5])

def mean(obs):
    return (1. / len(obs)) * np.sum(obs)

def variance(obs):
    return (1. / len(obs)) * np.sum((obs - mean(obs)) ** 2)

def kurt(obs):
    num = np.sqrt((1. / len(obs)) * np.sum((obs - mean(obs)) ** 4))
    denom = variance(obs) ** 2  # avoid losing precision with np.sqrt call
    return num / denom

The first two functions, mean and variance were successfully cross-validated with numpy.mean and numpy.var, respectively.

I attempted to cross-validate kurt with the following statement:

>>> kurtosis(testdata) == kurt(testdata)
False

Here's the output of both kurtosis functions:

>>> kurtosis(testdata)  # scipy.stats
-1.3

>>> kurt(testdata)  # my crappy attempt
0.65192024052026476

Where did I go wrong? Is scipy.stats.kurtosis doing something fancier than what's in the equation I've been given?

like image 732
Louis Thibault Avatar asked Dec 27 '22 14:12

Louis Thibault


1 Answers

By default, scipy.stats.kurtosis():

  1. Computes excess kurtosis (i.e. subtracts 3 from the result).
  2. Corrects for statistical biases (this affects some of the denominators).

Both behaviours are configurable through optional arguments to scipy.stats.kurtosis().

Finally, the np.sqrt() call in your method is unnecessary since there's no square root in the formula. Once I remove it, the output of your function matches what I get from kurtosis(testdata, False, False).

I attempted to cross-validate kurt with the following statement

You shouldn't be comparing floating-point numbers for exact equality. Even if the mathematical formulae are the same, small differences in how they are translated into computer code could affect the result of the computation.

Finally, if you're going to be writing numerical code, I strongly recommend reading What Every Computer Scientist Should Know About Floating-Point Arithmetic.

P.S. This is the function I've used:

In [51]: def kurt(obs):
   ....:     num = np.sum((obs - mean(obs)) ** 4)/ len(obs)
   ....:     denom = variance(obs) ** 2  # avoid losing precision with np.sqrt call
   ....:     return num / denom
like image 198
NPE Avatar answered Dec 29 '22 03:12

NPE