Count specific values and aggregating result in dataframe using transform

Question

I have a dataframe similar to this :

    Errorid  Matricule Priority
0      1        01       P1
1      2        01       P2
2      3        01       NC
3      4        02       P1
4      5        02       P4
5      6        02       EDC
6      7        02       P2

This list all the errors for a Matricule and their priority.

What I want to do is count all the errors for a Matricule while excluding "NC" and "EDC" and put the result in the same dataframe.

Result example :

    Errorid  Matricule Priority  NberrorsMatricule
0      1        01       P1           2
1      2        01       P2           2
2      3        01       NC           2
3      4        02       P1           3
4      5        02       P4           3
5      6        02       EDC          3
6      7        02       P2           3

I tried multiple things like below:

DF['NberrorsMatricule'] = DF.groupby('Matricule')['Pirority'].transform(lambda x : x.count() if x in ['P1','P2','P3','P4']) 

DF['NberrorsMatricule'] = DF.groupby('Matricule')[DF['Pirority'] in ['P1','P2','P3','P4']].transform("count")

Each time I get an ambiguous value error. ValueError: The truth value of a series is ambiguous. Use a.empty(), a.bool(), a.item(), a.any(), a.all().

Note that this one work :

DF['NberrorsMatricule'] = DF.groupby('Matricule')['Pirority'].transform("count")

But it obviously don't filter the pirority.

These dataframe are example, in reality I work with a huge amount of data ( more than 400k occurrence in this one) So If someone can help me understand the behavior of transform(), and how to efficiently filter the data It would be very nice.

Thanks in advance for your help

jezrael · Accepted Answer

You can replace non matched values to missing values by Series.where and Series.isin, so if use GroupBy.transform with GroupBy.count it exclude missing values:

L = ['P1','P2','P3','P4']
df['NberrorsMatricule'] = (df['Priority'].where(df['Priority'].isin(L))
                                         .groupby(df['Matricule'])
                                         .transform('count'))
print (df)
   Errorid  Matricule Priority  NberrorsMatricule
0        1          1       P1                  2
1        2          1       P2                  2
2        3          1       NC                  2
3        4          2       P1                  3
4        5          2       P4                  3
5        6          2      EDC                  3
6        7          2       P2                  3

Details:

print (df['Priority'].where(df['Priority'].isin(L)))
0     P1
1     P2
2    NaN
3     P1
4     P4
5    NaN
6     P2
Name: Priority, dtype: object

Another solution is count matched values by sum, for convert True and False to 1, 0 is possible use Series.view or Series.astype:

df['NberrorsMatricule'] = (df['Priority'].isin(L)
                                         .view('i1')
                                         .groupby(df['Matricule'])
                                         .transform('sum'))
print (df)

   Errorid  Matricule Priority  NberrorsMatricule
0        1          1       P1                  2
1        2          1       P2                  2
2        3          1       NC                  2
3        4          2       P1                  3
4        5          2       P4                  3
5        6          2      EDC                  3
6        7          2       P2                  3

Mayank Porwal · Answer

Like this:

In [567]:  df['NberrorsMatricule'] = df[~df.Priority.isin(['NC', 'EDC'])].\ 
     ...:                               groupby('Matricule')['Errorid']\ 
     ...:                               .transform('count')

To remove Nan, use ffill():

In [595]: df['NberrorsMatricule'] = df['NberrorsMatricule'].ffill()                                                                                                                                         

In [596]: df                                                                                                                                                                                                
Out[596]: 
   Errorid  Matricule Priority  NberrorsMatricule
0        1          1       P1                2.0
1        2          1       P2                2.0
2        3          1       NC                2.0
3        4          2       P1                3.0
4        5          2       P4                3.0
5        6          2      EDC                3.0
6        7          2       P2                3.0

Count specific values and aggregating result in dataframe using transform

Tags:

python

pandas

dataframe

group-by

transform

zonas

2 Answers

jezrael

Mayank Porwal

Recent Activity

Donate For Us

Count specific values and aggregating result in dataframe using transform

Tags:

python

pandas

dataframe

group-by

transform

zonas

2 Answers

jezrael

Mayank Porwal

Related questions

Recent Activity

Donate For Us