Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas empty correlation matrix

Tags:

python

pandas

I am running Python 2.7.6, pandas 0.13.1. I am unable to compute a correlation matrix from a DataFrame, and I'm not sure why. Here is my example DataFrame:

In [24]: foo Out[24]:                        A             B            C 2011-10-12   0.006204908 -0.0009503677  0.003480105 2011-10-13    0.00234903 -0.0005122284 -0.001738786 2011-10-14    0.01045599   0.000346268  0.002378351 2011-10-17   0.003239088   0.001246239 -0.002651856 2011-10-18   0.001717674 -0.0001738079  0.002013923 2011-10-19  0.0001919342  6.399505e-05 -0.001311259 2011-10-20  0.0007430615   0.001186141  0.001919222 2011-10-21   -0.01075129    -0.0015123  0.000807017 2011-10-24   -0.00819597 -0.0005124197  0.003037654 2011-10-25   -0.01604287   0.001157013 -0.001227516  [10 rows x 3 columns] 

Now I'll try to compute the correlation:

In [27]: foo.corr() Out[27]: Empty DataFrame Columns: [] Index: []  [0 rows x 0 columns] 

On the other hand, I can compute correlations of each column to each other column. For example:

In [31]: foo['A'].corr(foo['B']) Out[31]: 0.048578514633405255 

Any idea what might be causing this issue? Thanks a lot.

Version Info

In [34]: import pandas as pd  In [35]: pd.__version__ Out[35]: '0.13.1' 
like image 277
Max Avatar asked Mar 18 '14 13:03

Max


People also ask

How do you make a correlation matrix in python?

Method 1: Creating a correlation matrix using Numpy libraryNumpy library make use of corrcoef() function that returns a matrix of 2×2. The matrix consists of correlations of x with x (0,0), x with y (0,1), y with x (1,0) and y with y (1,1).

Does pandas Corr ignore NaN?

Pandas will ignore the pairwise correlation if it has NaN value in one of the observations. We can verify that by removing the those values and checking the results.

What does Corr () do in pandas?

The corr() method calculates the relationship between each column in your data set.


1 Answers

As Jeff mentioned in the comments, the problem resulted from my columns having the object dtype. For future reference, even if the object looks numeric, check the dtype and make sure it is numeric (e.g. do foo.astype(float)) before computing the correlation matrix.

like image 110
Max Avatar answered Sep 19 '22 20:09

Max