I created a 3D Pandas dataframe like this:
A= ['ECFP', 'ECFP', 'ECFP', 'FCFP', 'FCFP', 'FCFP', 'RDK5', 'RDK5', 'RDK5']
B = ['R', 'tau', 'RMSEc', 'R', 'tau', 'RMSEc', 'R', 'tau', 'RMSEc']
C = array([[ 0.1 , 0.3 , 0.5 , nan, 0.6 , 0.4 ],
[ 0.4 , 0.3 , 0.3 , nan, 0.4 , 0.3 ],
[ 1.2 , 1.3 , 1.1 , nan, 1.5 , 1. ],
[ 0.4 , 0.3 , 0.4 , 0.8 , 0.1 , 0.2 ],
[ 0.2 , 0.3 , 0.3 , 0.3 , 0.5 , 0.6 ],
[ 1. , 1.2 , 1. , 0.9 , 1.2 , 1. ],
[ 0.4 , 0.7 , 0.5 , 0.4 , 0.6 , 0.6 ],
[ 0.6 , 0.5 , 0.3 , 0.3 , 0.3 , 0.5 ],
[ 1.2 , 1.5 , 1.3 , 0.97, 1.5 , 1. ]])
df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))
df = df.dropna(axis=0, how='any')
The final Dataframe looks like this:
ECFP FCFP RDK5
R tau RMSEc R tau RMSEc R tau RMSEc
0 0.1 0.4 1.2 0.4 0.2 1.0 0.4 0.6 1.2
1 0.3 0.3 1.3 0.3 0.3 1.2 0.7 0.5 1.5
2 0.5 0.3 1.1 0.4 0.3 1.0 0.5 0.3 1.3
4 0.6 0.4 1.5 0.1 0.5 1.2 0.6 0.3 1.5
5 0.4 0.3 1.0 0.2 0.6 1.0 0.6 0.5 1.0
How can I get the correlation matrix only between 'R' values for all types of data ('ECFP', 'FCFP', 'RDK5')?
use IndexSlice:
In [53]: df.loc[:, pd.IndexSlice[:, 'R']]
Out[53]:
ECFP FCFP RDK5
R R R
0 0.1 0.4 0.4
1 0.3 0.3 0.7
2 0.5 0.4 0.5
4 0.6 0.1 0.6
5 0.4 0.2 0.6
By using slice
df.loc[:,(slice(None),'R')]
Out[375]:
ECFP FCFP RDK5
R R R
0 0.1 0.4 0.4
1 0.3 0.3 0.7
2 0.5 0.4 0.5
4 0.6 0.1 0.6
5 0.4 0.2 0.6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With