Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby Apply Filter without Lambda

Let's say I have this data:

data = {
    'batch_no': [42, 42, 52, 52, 52, 73],     
    'quality': ['OK', 'NOT OK', 'OK', 'NOT OK', 'NOT OK', 'OK'], 
     }
df = pd.DataFrame(data, columns = ['batch_no', 'quality'])

This gives me the following dataframe

batch_no    quality
42          OK
42          NOT OK
52          OK
52          NOT OK
52          NOT OK
73          OK

Now I need to find the count of NOT OK for each batch_no.

I can achieve this using groupby and apply with a lamda function as follows:

df.groupby('batch_no')['quality'].apply(lambda x: x[x.eq('NOT OK')].count())

This gives me the following desired output

batch_no
42              1
52              2
73              0

However this is extremely slow even on my moderate sized data of around 3 million rows and is not feasible for my needs.

Is there a fast alternative to this ?

like image 280
bhaskarc Avatar asked Mar 10 '26 16:03

bhaskarc


1 Answers

You can compare column quality, then groupby by batch_no and aggregate sum, Trues are processes like 1 so it count values:

df = df['quality'].eq('NOT OK')
                  .groupby(df['batch_no']).sum()
                  .astype(int)
                  .reset_index(name='count')
print (df)
   batch_no  count
0        42      1
1        52      2
2        73      0

Detail:

print (df['quality'].eq('NOT OK'))
0    False
1     True
2    False
3     True
4     True
5    False
Name: quality, dtype: bool
like image 187
jezrael Avatar answered Mar 14 '26 12:03

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!