Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating through list of Dataframes Pandas

Tags:

list

pandas

I currently have a series of 18 DataFrames (each representing a different year) consisting of 3 Columns and varying amounts of rows representing the normalize mutual information scores for amino acid residue positions like:

Year1

Pos1   Pos2   MI_Score
40     40     1.00    
40     44     0.53
40     70     0.23
44     44     1.00    
44     70     0.90
...

I would like to iterate through this list of DataFrames and trim off the rows that have Mutual Information scores less than 0.50 as well as the ones that are mutual information scores for a residue paired with itself. Here is what I've tried so far:

MIs = [MI_95,MI_96,MI_97,MI_98,MI_99,MI_00,MI_01,MI_02,MI_03,MI_04,MI_05,MI_06,MI_07,MI_08,MI_09,MI_10,MI_11,MI_12,MI_13] 
for MI in MIs:    
    p = []
    for q in range(0, len(MI)):
        if MI[0][q] != MI[1][q]:
            if MI[2][q] > 0.5:
                p.append([MI[0][q],MI[1][q],MI[2][q]])
    MI = pd.DataFrame(p) 

Yet this only trims the first item in MIs. Can someone help me find a way to iterate through the whole list and trim each dataframe?

Thanks

like image 924
user2587593 Avatar asked Mar 20 '23 22:03

user2587593


1 Answers

Avoid loops where possible. They are much slower, and usually less immediately easy to read, than "vectorized" methods that operate on all the data together. Here's one way.

In [17]: self_paired = df['Pos1'] == df['Pos2']

In [18]: low_MI = df['MI_Score'] < 0.50

In [19]: df[~(low_MI | self_paired)]
Out[19]:
   Pos1  Pos2  MI_Score
1    40    44      0.53
4    44    70      0.90

[2 rows x 3 columns]
like image 121
Dan Allan Avatar answered Mar 28 '23 08:03

Dan Allan