Quick question: Is there a way to use 'dropna' with the Pearson's r function in scipy? I'm using it in conjunction with pandas, and some of my data has holes in it. I know you used to be able suppress 'nan' with Spearman's r in older versions of scipy, but that functionality is now missing.
To my mind, this seems like a disimprovement, so I wonder if I'm missing something obvious.
My code:
for i in range(len(frame3.columns)):
correlation.append(sp.pearsonr(frame3.iloc[ :,i], control['CONTROL']))
The Pearson correlation coefficient [1] measures the linear relationship between two datasets. Like other correlation coefficients, this one varies between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y.
The pearsonr() SciPy function can be used to calculate the Pearson's correlation coefficient between two data samples with the same length. We can calculate the correlation between the two variables in our test problem.
You can use np.isnan
like this:
for i in range(len(frame3.columns)):
x, y = frame3.iloc[ :,i].values, control['CONTROL'].values
nas = np.logical_or(x.isnan(), y.isnan())
corr = sp.pearsonr(x[~nas], y[~nas])
correlation.append(corr)
You can also try creating temporary dataframe, and used pandas built-in method for computing pearson correlation, or use the .dropna method in the temporary dataframe to drup null values before using sp.pearsonr
for col in frame3.columns:
correlation.append(frame3[col].to_frame(name='3').join(control['CONTROL']).corr()['3']['CONTROL'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With