Pearson correlation and nan values

Question

I have two CSV_files with hundreds of columns and I want to calculate Pearson correlation coefficient and p value for every same columns of two CSV_files. The problem is that when there is a missing data "NaN" in one column, it gives me an error. When ".dropna" removes nan value from columns, sometimes the shapes of X and Y are not equal (based on removed nan values) and I receive this error:

"ValueError: operands could not be broadcast together with shapes (1020,) (1016,)"

Question: If row #8 in one csv in "nan", is there any way to remove the same row from the other csv too and do the analysis for every column based on rows that have values from both csv files?

import pandas as pd
import scipy
import csv
import numpy as np
from scipy import stats


df = pd.read_csv ("D:/Insitu-Daily.csv",header = None)
dg = pd.read_csv ("D:/Model-Daily.csv",header = None)

pearson_corr_set = []
pearson_p_set = []


for i in range(1,df.shape[1]):
    X= df[i].dropna(axis=0, how='any')
    Y= dg[i].dropna(axis=0, how='any')

    [pearson_corr, pearson_p] = scipy.stats.stats.pearsonr(X, Y)
    pearson_corr_set = np.append(pearson_corr_set,pearson_corr)
    pearson_p_set = np.append(pearson_p_set,pearson_p)

with open('D:/Results.csv','wb') as file:
    str1 = ",".join(str(i) for i in np.asarray(pearson_corr_set))
    file.write(str1)
    file.write('
')    
    str1 = ",".join(str(i) for i in np.asarray(pearson_p_set))
    file.write(str1)
    file.write('
')

c-wilson · Accepted Answer

Instead of dropna, try using isnan and boolean indexing:

for i in range(1, df.shape[1]):
    df_sub = df[i]
    dg_sub = dg[i]
    mask = ~np.isnan(df_sub) & ~np.isnan(dg_sub)  
    # mask array is now true where ith rows of df and dg are NOT nan.
    X = df_sub[mask]  # this returns a 1D array of length mask.sum()
    Y = df_sub[mask]
    ... your code continues.

Hope that helps!

Pearson correlation and nan values

Tags:

python

arrays

nan

numpy

pearson-correlation

Amy

1 Answers

c-wilson

Recent Activity

Donate For Us

Pearson correlation and nan values

Tags:

python

arrays

nan

numpy

pearson-correlation

Amy

1 Answers

c-wilson

Related questions

Recent Activity

Donate For Us