I currently have a series of 18 DataFrames (each representing a different year) consisting of 3 Columns and varying amounts of rows representing the normalize mutual information scores for amino acid residue positions like:
Year1
Pos1 Pos2 MI_Score
40 40 1.00
40 44 0.53
40 70 0.23
44 44 1.00
44 70 0.90
...
I would like to iterate through this list of DataFrames and trim off the rows that have Mutual Information scores less than 0.50 as well as the ones that are mutual information scores for a residue paired with itself. Here is what I've tried so far:
MIs = [MI_95,MI_96,MI_97,MI_98,MI_99,MI_00,MI_01,MI_02,MI_03,MI_04,MI_05,MI_06,MI_07,MI_08,MI_09,MI_10,MI_11,MI_12,MI_13]
for MI in MIs:
p = []
for q in range(0, len(MI)):
if MI[0][q] != MI[1][q]:
if MI[2][q] > 0.5:
p.append([MI[0][q],MI[1][q],MI[2][q]])
MI = pd.DataFrame(p)
Yet this only trims the first item in MIs. Can someone help me find a way to iterate through the whole list and trim each dataframe?
Thanks
Avoid loops where possible. They are much slower, and usually less immediately easy to read, than "vectorized" methods that operate on all the data together. Here's one way.
In [17]: self_paired = df['Pos1'] == df['Pos2']
In [18]: low_MI = df['MI_Score'] < 0.50
In [19]: df[~(low_MI | self_paired)]
Out[19]:
Pos1 Pos2 MI_Score
1 40 44 0.53
4 44 70 0.90
[2 rows x 3 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With