I have a Pandas Dataframe like so:
id    cat1    cat2    cat3    num1    num2
1     0       WN      29      2003    98
2     1       TX      12      755     76
3     0       WY      11      845     32
4     1       IL      19      935     46
I want to find out the correlation between cat1 and column cat3, num1 and num2
or between cat1 and num1 and num2
or between cat2 and cat1, cat3, num1, num2
When I use df.corr() it gives Correlation between all the columns in the dataframe, but I want to see Correlation between just these selective columns detailed above.
How do I do that in Python pandas?
A Thousand thanks in advance for your answers.
A correlation is usually tested for two variables at a time, but you can test correlations between three or more variables.
A multiple correlation coefficient (R) yields the maximum degree of liner relationship that can be obtained between two or more independent variables and a single dependent variable.
If you're interested in calculating the correlation between several variables in a Pandas DataFrame, you can simpy use the . corr() function.
You can also get the correlation between all the columns of a pandas DataFrame. For this, apply corr() function on the entire DataFrame which will result in a DataFrame of pair-wise correlation values between all the columns. Note that by default, the corr() function returns Pearson's correlation.
I tried the following and it worked :
features1=list(['cat1','cat2','cat3'])
features2=list(['Cat1', 'Cat2','num1','num2'])
df[features1].corr()
df[features2].corr()
Good way to select the columns based on the need when you have a very high number of variables in your dataset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With