I have a homework problem in which I'm supposed to write a function for Kurtosis as descirbed here:
The theta in the denominator is the standard deviation (square-root of the variance) and the x-with-the-bar in the numerator is the mean of x
.
I've implemented the function as follows:
import numpy as np
from scipy.stats import kurtosis
testdata = np.array([1, 2, 3, 4, 5])
def mean(obs):
return (1. / len(obs)) * np.sum(obs)
def variance(obs):
return (1. / len(obs)) * np.sum((obs - mean(obs)) ** 2)
def kurt(obs):
num = np.sqrt((1. / len(obs)) * np.sum((obs - mean(obs)) ** 4))
denom = variance(obs) ** 2 # avoid losing precision with np.sqrt call
return num / denom
The first two functions, mean
and variance
were successfully cross-validated with numpy.mean
and numpy.var
, respectively.
I attempted to cross-validate kurt
with the following statement:
>>> kurtosis(testdata) == kurt(testdata)
False
Here's the output of both kurtosis functions:
>>> kurtosis(testdata) # scipy.stats
-1.3
>>> kurt(testdata) # my crappy attempt
0.65192024052026476
Where did I go wrong? Is scipy.stats.kurtosis
doing something fancier than what's in the equation I've been given?
By default, scipy.stats.kurtosis()
:
Both behaviours are configurable through optional arguments to scipy.stats.kurtosis()
.
Finally, the np.sqrt()
call in your method is unnecessary since there's no square root in the formula. Once I remove it, the output of your function matches what I get from kurtosis(testdata, False, False)
.
I attempted to cross-validate kurt with the following statement
You shouldn't be comparing floating-point numbers for exact equality. Even if the mathematical formulae are the same, small differences in how they are translated into computer code could affect the result of the computation.
Finally, if you're going to be writing numerical code, I strongly recommend reading What Every Computer Scientist Should Know About Floating-Point Arithmetic.
P.S. This is the function I've used:
In [51]: def kurt(obs):
....: num = np.sum((obs - mean(obs)) ** 4)/ len(obs)
....: denom = variance(obs) ** 2 # avoid losing precision with np.sqrt call
....: return num / denom
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With