I am trying to figure out how to calculate covariance with the Python Numpy function cov. When I pass it two one-dimentional arrays, I get back a 2x2 matrix of results. I don't know what to do with that. I'm not great at statistics, but I believe covariance in such a situation should be a single number. This is what I am looking for. I wrote my own:
def cov(a, b): if len(a) != len(b): return a_mean = np.mean(a) b_mean = np.mean(b) sum = 0 for i in range(0, len(a)): sum += ((a[i] - a_mean) * (b[i] - b_mean)) return sum/(len(a)-1)
That works, but I figure the Numpy version is much more efficient, if I could figure out how to use it.
Does anybody know how to make the Numpy cov function perform like the one I wrote?
Thanks,
Dave
Covariance between 2 random variables is calculated by taking the product of the difference between the value of each random variable and its mean, summing all the products, and finally dividing it by the number of values in the dataset. In this case, the covariance of a and b is 6.35 (a positive covariance).
In NumPy for computing the covariance matrix of two given arrays with help of numpy. cov(). In this, we will pass the two arrays and it will return the covariance matrix of two given arrays.
To compute the covariance between two variables we simply compute the dot product of the variance vectors and divide by the sample size. /usr/lib/pymodules/python2. 7/matplotlib/collections.
When a
and b
are 1-dimensional sequences, numpy.cov(a,b)[0][1]
is equivalent to your cov(a,b)
.
The 2x2 array returned by np.cov(a,b)
has elements equal to
cov(a,a) cov(a,b) cov(a,b) cov(b,b)
(where, again, cov
is the function you defined above.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With