I have looked through various SO questions solving the problem of counting duplicate rows on specific columns, most relevant is this one.
The thing is, that this solution is very specific and I can't figure out how to generalize it to dataframes with many more data. I have a dataframe that has many columns, and I want to add a new column named 'A_D_E_count' that will indicate how many rows in the entire dataframe have the same value of A, D and E columns of each row.
Preferably this should work using .transform
function
Example:
Out[6]:
A B C D E
0 294 41981 37597 39875 33364
1 294 39776 37597 37572 39171
2 294 44658 49408 43713 49408
3 294 58615 52065 43713 49408
4 294 44811 51238 42926 49408
Over this dataframe, I would like to add a column that will count number of rows containing the same A
D
and E
values so the result would be
Out[6]:
A B C D E A_D_E_count
0 294 41981 37597 39875 33364 1
1 294 39776 37597 37572 39171 1
2 294 44658 49408 43713 49408 2
3 294 58615 52065 43713 49408 2
4 294 44811 51238 42926 49408 1
I think you need size
or count
if dont need count NaN
s with transform
:
cols = ['A','D','E']
df['A_D_E_count'] = df.groupby(cols)['A'].transform('size')
print (df)
A B C D E A_D_E_count
0 294 41981 37597 39875 33364 1
1 294 39776 37597 37572 39171 1
2 294 44658 49408 43713 49408 2
3 294 58615 52065 43713 49408 2
4 294 44811 51238 42926 49408 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With