This is my csv look like,
name, cuisine, review
A, Chinese, this
A, Indian, is
B, Indian, an
B, Indian, example
B, French, thank
C, French, you
I trying to count how many times the diff kind of cuisines appear by name. This is what I should be getting
Cuisine, Count
Chinese, 1
Indian, 2
French, 2
But as you can see there are duplicates within the name e.g. B so I try to drop_duplicates but I can't. I use
df.groupby('name')['cuisine'].drop_duplicates()
and it says series groupby object cannot.
Somehow I need to apply value_counts() to get the number of occurrences of the cuisine word but the duplicates thing is hindering. Any idea how I can get this in pandas? Thanks.
Remove All Duplicate Rows from Pandas DataFrame You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .
SQL delete duplicate Rows using Group By and having clause In this method, we use the SQL GROUP BY clause to identify the duplicate rows. The Group By clause groups data as per the defined columns and we can use the COUNT function to check the occurrence of a row.
To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.
You're looking for groupby
and nunique
:
df.groupby('cuisine', sort=False).name.nunique().to_frame('count')
count
cuisine
Chinese 1
Indian 2
French 2
Will return the count of unique items per group.
Using crosstab
pd.crosstab(df.name,df.cuisine).ne(0).sum()
Out[550]:
cuisine
Chinese 1
French 2
Indian 2
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With