Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas drop duplicates within groupby [duplicate]

Tags:

python

pandas

This is my csv look like,

name, cuisine, review
A, Chinese, this
A, Indian, is
B, Indian, an
B, Indian, example
B, French, thank
C, French, you

I trying to count how many times the diff kind of cuisines appear by name. This is what I should be getting

Cuisine, Count
Chinese, 1
Indian, 2
French, 2

But as you can see there are duplicates within the name e.g. B so I try to drop_duplicates but I can't. I use

df.groupby('name')['cuisine'].drop_duplicates() 

and it says series groupby object cannot.

Somehow I need to apply value_counts() to get the number of occurrences of the cuisine word but the duplicates thing is hindering. Any idea how I can get this in pandas? Thanks.

like image 738
Ah Sheng Avatar asked Nov 09 '18 03:11

Ah Sheng


People also ask

How do I get rid of duplicates in Pandas?

Remove All Duplicate Rows from Pandas DataFrame You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .

Does group by remove duplicates?

SQL delete duplicate Rows using Group By and having clause In this method, we use the SQL GROUP BY clause to identify the duplicate rows. The Group By clause groups data as per the defined columns and we can use the COUNT function to check the occurrence of a row.

How do you remove duplicates from a DataFrame in Python based on column?

To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.


2 Answers

You're looking for groupby and nunique:

df.groupby('cuisine', sort=False).name.nunique().to_frame('count')

         count
cuisine       
Chinese      1
Indian       2
French       2

Will return the count of unique items per group.

like image 100
cs95 Avatar answered Oct 19 '22 05:10

cs95


Using crosstab

pd.crosstab(df.name,df.cuisine).ne(0).sum()
Out[550]: 
cuisine
 Chinese    1
 French     2
 Indian     2
dtype: int64
like image 2
BENY Avatar answered Oct 19 '22 07:10

BENY