numpy corrcoef - compute correlation matrix while ignoring missing data

Tags:

I am trying to compute a correlation matrix of several values. These values include some 'nan' values. I'm using numpy.corrcoef. For element(i,j) of the output correlation matrix I'd like to have the correlation calculated using all values that exist for both variable i and variable j.

This is what I have now:

In[20]: df_counties = pd.read_sql("SELECT Median_Age, Rpercent_2008, overall_LS, population_density FROM countyVotingSM2", db_eng) In[21]: np.corrcoef(df_counties, rowvar = False) Out[21]:  array([[ 1.        ,         nan,         nan, -0.10998411],        [        nan,         nan,         nan,         nan],        [        nan,         nan,         nan,         nan],        [-0.10998411,         nan,         nan,  1.        ]])

Too many nan's :(

821

asked Jul 24 '15 20:07

Selah

1 Answers

One of the main features of pandas is being NaN friendly. To calculate correlation matrix, simply call df_counties.corr(). Below is an example to demonstrate df.corr() is NaN tolerant whereas np.corrcoef is not.

import pandas as pd import numpy as np  # data # ============================== np.random.seed(0) df = pd.DataFrame(np.random.randn(100,5), columns=list('ABCDE')) df[df < 0] = np.nan df           A       B       C       D       E 0   1.7641  0.4002  0.9787  2.2409  1.8676 1      NaN  0.9501     NaN     NaN  0.4106 2   0.1440  1.4543  0.7610  0.1217  0.4439 3   0.3337  1.4941     NaN  0.3131     NaN 4      NaN  0.6536  0.8644     NaN  2.2698 5      NaN  0.0458     NaN  1.5328  1.4694 6   0.1549  0.3782     NaN     NaN     NaN 7   0.1563  1.2303  1.2024     NaN     NaN 8      NaN     NaN     NaN  1.9508     NaN 9      NaN     NaN  0.7775     NaN     NaN ..     ...     ...     ...     ...     ... 90     NaN  0.8202  0.4631  0.2791  0.3389 91  2.0210     NaN     NaN  0.1993     NaN 92     NaN     NaN     NaN  0.1813     NaN 93  2.4125     NaN     NaN     NaN  0.2515 94     NaN     NaN     NaN     NaN  1.7389 95  0.9944  1.3191     NaN  1.1286  0.4960 96  0.7714  1.0294     NaN     NaN  0.8626 97     NaN  1.5133  0.5531     NaN  0.2205 98     NaN     NaN  1.1003  1.2980  2.6962 99     NaN     NaN     NaN     NaN     NaN  [100 rows x 5 columns]  # calculations # ================================ df.corr()          A       B       C       D       E A  1.0000  0.2718  0.2678  0.2822  0.1016 B  0.2718  1.0000 -0.0692  0.1736 -0.1432 C  0.2678 -0.0692  1.0000 -0.3392  0.0012 D  0.2822  0.1736 -0.3392  1.0000  0.1562 E  0.1016 -0.1432  0.0012  0.1562  1.0000   np.corrcoef(df, rowvar=False)  array([[ nan,  nan,  nan,  nan,  nan],        [ nan,  nan,  nan,  nan,  nan],        [ nan,  nan,  nan,  nan,  nan],        [ nan,  nan,  nan,  nan,  nan],        [ nan,  nan,  nan,  nan,  nan]])

answered Sep 21 '22 05:09

Jianxun Li

Related questions
                            
                                vscode debug ES6 application
                            
                                Ansible - How to keep appending new keys to a dictionary when using set_fact module with with_items?
                            
                                How to get and set specified time in java.time.Instant?
                            
                                C++ How to detect Windows 10
                            
                                Conditionally ignore individual tests with Karma / Jasmine
                            
                                Dynamic UIImageView Size Within UITableView
                            
                                Convert *some* column classes in data.table
                            
                                Must be owner of relation django_site when manage.py migrate
                            
                                CLPlacemark to string in iOS 9
                            
                                Cause of Error CS0161: not all code paths return a value
                            
                                Shapeless: Generic.Aux
                            
                                Adding search functionality in select options using Bootstrap

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With