Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python package that supports weighted covariance computation

Is there a python statistical package that supports the computation of weighted covariance (i.e., each observation has a weight) ? Unfortuantely numpy.cov does not support weights.

Preferably working under numpy/scipy framework (i.e., able to use numpy arrays to speed up the computation).

Thanks a lot!

like image 348
CuriousMind Avatar asked Jul 11 '12 23:07

CuriousMind


People also ask

How do you calculate covariance in Python?

The covariance may be computed using the Numpy function np. cov() . For example, we have two sets of data x and y , np. cov(x, y) returns a 2D array where entries [0,1] and [1,0] are the covariances.

What is covariance in Numpy?

Covariance provides the a measure of strength of correlation between two variable or more set of variables.


1 Answers

statsmodels has weighted covariance calculation in stats.

But we can still calculate it also directly:

# -*- coding: utf-8 -*-
"""descriptive statistic with case weights

Author: Josef Perktold
"""

import numpy as np
from statsmodels.stats.weightstats import DescrStatsW


np.random.seed(987467)
x = np.random.multivariate_normal([0, 1.], [[1., 0.5], [0.5, 1]], size=20)
weights = np.random.randint(1, 4, size=20)

xlong = np.repeat(x, weights, axis=0)

ds = DescrStatsW(x, weights=weights)

print 'cov statsmodels'
print ds.cov

self = ds  #alias to use copied expression
ds_cov = np.dot(self.weights * self.demeaned.T, self.demeaned) / self.sum_weights

print '\nddof=0'
print ds_cov
print np.cov(xlong.T, bias=1)

# calculating it directly
ds_cov0 = np.dot(self.weights * self.demeaned.T, self.demeaned) / \
              (self.sum_weights - 1)
print '\nddof=1'
print ds_cov0
print np.cov(xlong.T, bias=0)

This prints:

cov  statsmodels
[[ 0.43671986  0.06551506]
 [ 0.06551506  0.66281218]]

ddof=0
[[ 0.43671986  0.06551506]
 [ 0.06551506  0.66281218]]
[[ 0.43671986  0.06551506]
 [ 0.06551506  0.66281218]]

ddof=1
[[ 0.44821249  0.06723914]
 [ 0.06723914  0.68025461]]
[[ 0.44821249  0.06723914]
 [ 0.06723914  0.68025461]]

editorial note

The initial answer pointed out a bug in statsmodels that has been fixed in the meantime.

like image 156
Josef Avatar answered Oct 25 '22 04:10

Josef