Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas pandas correlation one column vs all

I'm trying to get the correlation between a single column and the rest of the numerical columns of the dataframe, but I'm stuck.

I'm trying with this:

corr = IM['imdb_score'].corr(IM)

But I get the error

operands could not be broadcast together with shapes

which I assume is because I'm trying to find a correlation between a vector (my imdb_score column) with the dataframe of several columns.

How can this be fixed?

like image 622
DiegoIE Avatar asked Mar 28 '26 11:03

DiegoIE


1 Answers

The most efficient method it to use corrwith.

Example:

df.corrwith(df['A'])

Setup of example data:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(10, size=(5, 5)), columns=list('ABCDE'))

#    A  B  C  D  E
# 0  7  2  0  0  0
# 1  4  4  1  7  2
# 2  6  2  0  6  6
# 3  9  8  0  2  1
# 4  6  0  9  7  7

output:

A    1.000000
B    0.526317
C   -0.209734
D   -0.720400
E   -0.326986
dtype: float64
like image 148
mozway Avatar answered Apr 02 '26 20:04

mozway



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!