Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find group-column have duplicate values in a dataframegroup python?

first i have a df, when i groupby it with a column, will it remove duplicate values?. Second, how to know which group have duplicate values ( i tried to find how to know which columns of a df have duplicate values but couldn't find anything, they just talk about how each element duplicated or not)

ex i have a df like this:
     A    B   C
1    1    2   3
2    1    4   3
3    2    2   2
4    2    3   4
5    2    2   3

after groupby('A')

A    B       C
1    2       3
     4       3
2    2       2
     3       2
     2       3

i want to know how many group A have B duplicated, and how many group A have C duplicated

result:
   B    C
1  1    2

or maybe better can caculate percent

B : 50%
C : 100%

thanks

like image 612
robocon20x Avatar asked Nov 16 '25 19:11

robocon20x


2 Answers

You could use a lambda function inside GroupBy.agg to compare number of unique values that is not equal to the number of values in a group. To get the number of unique we can use Series.nunique and Series.size for the number of values in a group.

df.groupby(level=0).agg(lambda x: x.size!=x.nunique())

#        B      C
# 1  False   True
# 2   True  False
like image 108
Ch3steR Avatar answered Nov 19 '25 09:11

Ch3steR


Let us try

out = df.groupby(level=0).agg(lambda x : x.duplicated().any())
       B      C
1  False   True
2   True  False
like image 26
BENY Avatar answered Nov 19 '25 08:11

BENY