I have this simple dataframe df:
a,b
1,2
1,3
1,4
1,2
2,1
2,2
2,3
2,5
2,5
I would like to check whether there are duplicates in b with respect to each group in a. So far I did the following:
g = df.groupby('a')['b'].unique()
which returns:
a
1       [2, 3, 4]
2    [1, 2, 3, 5]
But what I would like to have is a list, for each group in a, with multiple occurrences in b. The expected output in this case would be:
a
1    [2]
2    [5]
                You can use the duplicated() function to find duplicate values in a pandas DataFrame.
From the docs: "NA groups in GroupBy are automatically excluded".
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
g=df.groupby('a')['b'].value_counts()
g.where(g>1).dropna()
                        We can use duplicated
print(df[df.duplicated()].drop_duplicates())
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With