Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas - check for non unique values in dataframe groupby

Tags:

python

pandas

I have this simple dataframe df:

a,b
1,2
1,3
1,4
1,2
2,1
2,2
2,3
2,5
2,5

I would like to check whether there are duplicates in b with respect to each group in a. So far I did the following:

g = df.groupby('a')['b'].unique()

which returns:

a
1       [2, 3, 4]
2    [1, 2, 3, 5]

But what I would like to have is a list, for each group in a, with multiple occurrences in b. The expected output in this case would be:

a
1    [2]
2    [5]
like image 829
Fabio Lamanna Avatar asked Nov 16 '15 09:11

Fabio Lamanna


People also ask

How do you get non unique values in pandas?

You can use the duplicated() function to find duplicate values in a pandas DataFrame.

Does pandas groupby ignore Nan?

From the docs: "NA groups in GroupBy are automatically excluded".

How do you check if there are duplicates in pandas DataFrame?

The pandas. DataFrame. duplicated() method is used to find duplicate rows in a DataFrame. It returns a boolean series which identifies whether a row is duplicate or unique.


2 Answers

g=df.groupby('a')['b'].value_counts()
g.where(g>1).dropna()
like image 138
atomh33ls Avatar answered Sep 21 '22 15:09

atomh33ls


We can use duplicated

print(df[df.duplicated()].drop_duplicates())
like image 45
akrun Avatar answered Sep 22 '22 15:09

akrun