I created a 3D Pandas dataframe like this:
A=  ['ECFP', 'ECFP', 'ECFP', 'FCFP', 'FCFP', 'FCFP', 'RDK5', 'RDK5', 'RDK5']
B = ['R', 'tau', 'RMSEc', 'R', 'tau', 'RMSEc', 'R', 'tau', 'RMSEc']
C = array([[ 0.1 ,  0.3 ,  0.5 ,   nan,  0.6 ,  0.4 ],
       [ 0.4 ,  0.3 ,  0.3 ,   nan,  0.4 ,  0.3 ],
       [ 1.2 ,  1.3 ,  1.1 ,   nan,  1.5 ,  1.  ],
       [ 0.4 ,  0.3 ,  0.4 ,  0.8 ,  0.1 ,  0.2 ],
       [ 0.2 ,  0.3 ,  0.3 ,  0.3 ,  0.5 ,  0.6 ],
       [ 1.  ,  1.2 ,  1.  ,  0.9 ,  1.2 ,  1.  ],
       [ 0.4 ,  0.7 ,  0.5 ,  0.4 ,  0.6 ,  0.6 ],
       [ 0.6 ,  0.5 ,  0.3 ,  0.3 ,  0.3 ,  0.5 ],
       [ 1.2 ,  1.5 ,  1.3 ,  0.97,  1.5 ,  1.  ]])
df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))
df = df.dropna(axis=0, how='any')
The final Dataframe looks like this:
  ECFP            FCFP            RDK5           
     R  tau RMSEc    R  tau RMSEc    R  tau RMSEc
0  0.1  0.4   1.2  0.4  0.2   1.0  0.4  0.6   1.2
1  0.3  0.3   1.3  0.3  0.3   1.2  0.7  0.5   1.5
2  0.5  0.3   1.1  0.4  0.3   1.0  0.5  0.3   1.3
4  0.6  0.4   1.5  0.1  0.5   1.2  0.6  0.3   1.5
5  0.4  0.3   1.0  0.2  0.6   1.0  0.6  0.5   1.0
How can I get the correlation matrix only between 'R' values for all types of data ('ECFP', 'FCFP', 'RDK5')?
use IndexSlice:
In [53]: df.loc[:, pd.IndexSlice[:, 'R']]
Out[53]:
  ECFP FCFP RDK5
     R    R    R
0  0.1  0.4  0.4
1  0.3  0.3  0.7
2  0.5  0.4  0.5
4  0.6  0.1  0.6
5  0.4  0.2  0.6
                        By using slice
df.loc[:,(slice(None),'R')]
Out[375]: 
  ECFP FCFP RDK5
     R    R    R
0  0.1  0.4  0.4
1  0.3  0.3  0.7
2  0.5  0.4  0.5
4  0.6  0.1  0.6
5  0.4  0.2  0.6
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With