I have a Pandas Dataframe like so:
id cat1 cat2 cat3 num1 num2
1 0 WN 29 2003 98
2 1 TX 12 755 76
3 0 WY 11 845 32
4 1 IL 19 935 46
I want to find out the correlation between cat1
and column cat3
, num1
and num2
or between cat1
and num1
and num2
or between cat2
and cat1, cat3, num1, num2
When I use df.corr()
it gives Correlation between all the columns in the dataframe, but I want to see Correlation between just these selective columns detailed above.
How do I do that in Python pandas?
A Thousand thanks in advance for your answers.
A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.
A multiple correlation coefficient (R) yields the maximum degree of liner relationship that can be obtained between two or more independent variables and a single dependent variable.
If you're interested in calculating the correlation between several variables in a Pandas DataFrame, you can simpy use the . corr() function.
You can also get the correlation between all the columns of a pandas DataFrame. For this, apply corr() function on the entire DataFrame which will result in a DataFrame of pair-wise correlation values between all the columns. Note that by default, the corr() function returns Pearson's correlation.
I tried the following and it worked :
features1=list(['cat1','cat2','cat3'])
features2=list(['Cat1', 'Cat2','num1','num2'])
df[features1].corr()
df[features2].corr()
Good way to select the columns based on the need when you have a very high number of variables in your dataset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With