Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Diagonal element for covariance matrix not 1 pandas/numpy

I have the following dataframe:

   A  B
0  1  5
1  2  6
2  3  7
3  4  8

I wish to calculate the covariance

a = df.iloc[:,0].values

b = df.iloc[:,1].values

Using numpy for cov as :

numpy.cov(a,b)

I get:

array([[ 1.66666667,  1.66666667],
   [ 1.66666667,  1.66666667]])

Shouldn't the diagonal elements be 1? How do I get the diagonal elements to 1?

like image 436
Prgmr Avatar asked Oct 26 '25 20:10

Prgmr


2 Answers

No they shouldn't. I think you might be confusing it with Correlation. Correlation and Covariance are different.

What you see in the diagonals is simply the variance of the variables! Wiki screenshot for the formulas -

enter image description here

Wiki Link

like image 145
Vivek Kalyanarangan Avatar answered Oct 28 '25 12:10

Vivek Kalyanarangan


Use pd.DataFrame.corr
Also, no need to use Numpy here when the built in Pandas method does the job well for you. Correlations will be one because you've normalized the different series by their respective standard deviations.

df.corr() 

     A    B
A  1.0  1.0
B  1.0  1.0

While pd.DataFrame.cov gets you

df.cov()

          A         B
A  1.666667  1.666667
B  1.666667  1.666667

The other posters are correct. We can see that performing the maths correctly, we get

df.cov().div(df.std()).div(df.std(), 0)

     A    B
A  1.0  1.0
B  1.0  1.0
like image 24
piRSquared Avatar answered Oct 28 '25 10:10

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!