I would like to calculate the correlation coefficient between two columns of a pandas data frame after making a column boolean in nature. The original table
had two columns: a Group
Column with one of two treatment groups, now boolean, and an Age
Group. Those are the two columns I'm looking to calculate the correlation coefficient.
I tried the .corr()
method, with:
table.corr(method='pearson')
but have this returned to me:
I have pasted the first 25 rows of boolean table
below. I don't know if I'm missing parameters, or how to interpret this result. It's also strange that it's 1 as well. Thanks in advance!
Group Age
0 1 50
1 1 59
2 1 22
3 1 48
4 1 53
5 1 48
6 1 29
7 1 44
8 1 28
9 1 42
10 1 35
11 0 54
12 0 43
13 1 50
14 1 62
15 0 64
16 0 39
17 1 40
18 1 59
19 1 46
20 0 56
21 1 21
22 1 45
23 0 41
24 1 46
25 0 35
Calling .corr()
on the entire DataFrame gives you a full correlation matrix:
>>> table.corr()
Group Age
Group 1.0000 -0.1533
Age -0.1533 1.0000
You can use the separate Series instead:
>>> table['Group'].corr(table['Age'])
-0.15330486289034567
This should be faster than using the full matrix and indexing it (with df.corr().iat['Group', 'Age']
). Also, this should work whether Group
is bool or int dtype.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With