I have this simple dataframe df
:
a,b
1,2
1,3
1,4
1,2
2,1
2,2
2,3
2,5
2,5
I would like to check whether there are duplicates in b
with respect to each group in a
. So far I did the following:
g = df.groupby('a')['b'].unique()
which returns:
a
1 [2, 3, 4]
2 [1, 2, 3, 5]
But what I would like to have is a list, for each group in a
, with multiple occurrences in b
. The expected output in this case would be:
a
1 [2]
2 [5]
You can use the duplicated() function to find duplicate values in a pandas DataFrame.
From the docs: "NA groups in GroupBy are automatically excluded".
The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.
g=df.groupby('a')['b'].value_counts()
g.where(g>1).dropna()
We can use duplicated
print(df[df.duplicated()].drop_duplicates())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With