Using transform to add a count of duplicate rows on certain columns - Pandas

Question

I have looked through various SO questions solving the problem of counting duplicate rows on specific columns, most relevant is this one.

The thing is, that this solution is very specific and I can't figure out how to generalize it to dataframes with many more data. I have a dataframe that has many columns, and I want to add a new column named 'A_D_E_count' that will indicate how many rows in the entire dataframe have the same value of A, D and E columns of each row.

Preferably this should work using .transform function

Example:

Out[6]: 
      A        B        C        D        E
0   294    41981    37597    39875    33364
1   294    39776    37597    37572    39171
2   294    44658    49408    43713    49408
3   294    58615    52065    43713    49408
4   294    44811    51238    42926    49408

Over this dataframe, I would like to add a column that will count number of rows containing the same A D and E values so the result would be

Out[6]: 
      A        B        C        D        E  A_D_E_count
0   294    41981    37597    39875    33364            1
1   294    39776    37597    37572    39171            1
2   294    44658    49408    43713    49408            2
3   294    58615    52065    43713    49408            2
4   294    44811    51238    42926    49408            1

jezrael · Accepted Answer

I think you need size or count if dont need count NaNs with transform:

cols = ['A','D','E']

df['A_D_E_count'] = df.groupby(cols)['A'].transform('size')
print (df)
     A      B      C      D      E  A_D_E_count
0  294  41981  37597  39875  33364            1
1  294  39776  37597  37572  39171            1
2  294  44658  49408  43713  49408            2
3  294  58615  52065  43713  49408            2
4  294  44811  51238  42926  49408            1

Using transform to add a count of duplicate rows on certain columns - Pandas

Tags:

python

pandas

bluesummers

1 Answers

jezrael

Recent Activity

Donate For Us

Using transform to add a count of duplicate rows on certain columns - Pandas

Tags:

python

pandas

bluesummers

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us