Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove a certain type of values in a group in pandas?

I have the following dataframe which is a small part of a bigger one:

   acc_num   trans_cdi
0     1         c
1     1         d
3     3         d
4     3         c
5     3         d
6     3         d

I'd like to delete all rows where the last items are "d". So my desired dataframe would look like this:

   acc_num   trans_cdi
0     1         c
3     3         d
4     3         c

So the point is, that a group shouldn't have "d" as the last item.

There is a code that deletes the last row in the groups where the last item is "d". But in this case, I have to run the code twice to delete all last "d"-s in group 3 for example.

clean_3 = clean_2[clean_2.groupby('account_num')['trans_cdi'].transform(lambda x: (x.iloc[-1] != "d") | (x.index != x.index[-1]))]

Is there a better solution to this problem?

like image 813
craze Avatar asked Jan 25 '23 21:01

craze


2 Answers

We can use idxmax here with reversing the data [::-1] and then get the index:

grps = df['trans_cdi'].ne('d').groupby(df['acc_num'], group_keys=False)
idx = grps.apply(lambda x: x.loc[:x[::-1].idxmax()]).index
df.loc[idx]

   acc_num trans_cdi
0        1         c
3        3         d
4        3         c

Testing on consecutive value

   acc_num trans_cdi
0        1         c
1        1         d <--- d between two c, so we need to keep
2        1         c
3        1         d <--- row to be dropped
4        3         d
5        3         c
6        3         d
7        3         d

grps = df['trans_cdi'].ne('d').groupby(df['acc_num'], group_keys=False)
idx = grps.apply(lambda x: x.loc[:x[::-1].idxmax()]).index
df.loc[idx]

   acc_num trans_cdi
0        1         c
1        1         d
2        1         c
4        3         d
5        3         c

Still gives correct result.

like image 112
Erfan Avatar answered Jan 27 '23 12:01

Erfan


You can try this not so pandorable solution.

def r(x):
    c = 0
    for v in x['trans_cdi'].iloc[::-1]:
        if v == 'd':
            c = c+1
        else:
            break
    return x.iloc[:-c]

df.groupby('acc_num', group_keys=False).apply(r)

   acc_num trans_cdi
0        1         c
3        3         d
4        3         c
like image 23
Ch3steR Avatar answered Jan 27 '23 12:01

Ch3steR