Dropping 'nan' with Pearson's r in scipy/pandas

Tags:

Quick question: Is there a way to use 'dropna' with the Pearson's r function in scipy? I'm using it in conjunction with pandas, and some of my data has holes in it. I know you used to be able suppress 'nan' with Spearman's r in older versions of scipy, but that functionality is now missing.

To my mind, this seems like a disimprovement, so I wonder if I'm missing something obvious.

My code:

for i in range(len(frame3.columns)):    
    correlation.append(sp.pearsonr(frame3.iloc[ :,i], control['CONTROL']))

714

asked Aug 11 '16 10:08

Lodore66

2 Answers

You can use np.isnan like this:

for i in range(len(frame3.columns)):    
    x, y = frame3.iloc[ :,i].values, control['CONTROL'].values
    nas = np.logical_or(x.isnan(), y.isnan())
    corr = sp.pearsonr(x[~nas], y[~nas])
    correlation.append(corr)

129

answered Oct 11 '22 23:10

Ami Tavory

You can also try creating temporary dataframe, and used pandas built-in method for computing pearson correlation, or use the .dropna method in the temporary dataframe to drup null values before using sp.pearsonr

for col in frame3.columns:    
     correlation.append(frame3[col].to_frame(name='3').join(control['CONTROL']).corr()['3']['CONTROL'])

answered Oct 11 '22 22:10

Daniel Gibson

Related questions
                            
                                No module named xlsxwriter error while writing pandas df to excel
                            
                                TypeError: can't pickle _thread._local objects when using dask on pandas DataFrame
                            
                                Dataframe convert header row to row pandas
                            
                                Combine 2 string columns in pandas with different conditions in both columns
                            
                                pandas: DataFrame.mean() very slow. How can I calculate means of columns faster?
                            
                                Unable to adjust x-axis DateFormat in pandas bar chart
                            
                                how to set values to rows of boolean filtered dataframe column
                            
                                How is pandas deciding order in a sort when there is a tie?
                            
                                Why does pandas use (&, |) instead of the normal, pythonic (and, or)?
                            
                                Pandas: Index updating and changing value accessed by location
                            
                                Pandas Series to Excel
                            
                                Python pandas - particular merge/replacement
                            
                                Pandas: expand index of a series so it contains all values in a range
                            
                                How to apply rolling functions in a group by object in pandas
                            
                                Python PCA on Matrix too large to fit into memory
                            
                                How to remove string value from column in pandas dataframe
                            
                                How to skip reading empty files with panda in Python
                            
                                Accessing a pandas.DataFrame column name with a '.' in it
                            
                                How do I extract data from a Bokeh ColumnDatasource
                            
                                Pandas equivalent rbind operation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Dropping 'nan' with Pearson's r in scipy/pandas

Tags:

pandas

nan

scipy

pearson

Lodore66

People also ask

2 Answers

Ami Tavory

Daniel Gibson

Recent Activity

Donate For Us