This is my csv look like, <pre class="prettyprint"><code>name, cuisine, review A, Chinese, this A, Indian, is B, Indian, an B, Indian, example B, French, thank C, French, you </code></pre> I trying to count how many times the diff kind of cuisines appear by name. This is what I should be getting <pre class="prettyprint"><code>Cuisine, Count Chinese, 1 Indian, 2 French, 2 </code></pre> But as you can see there are duplicates within the name e.g. B so I try to drop_duplicates but I can't. I use <pre class="prettyprint"><code>df.groupby('name')['cuisine'].drop_duplicates() </code></pre> and it says series groupby object cannot. Somehow I need to apply value_counts() to get the number of occurrences of the cuisine word but the duplicates thing is hindering. Any idea how I can get this in pandas? Thanks.

Using <code>crosstab</code> <pre class="prettyprint"><code>pd.crosstab(df.name,df.cuisine).ne(0).sum() Out[550]: cuisine Chinese 1 French 2 Indian 2 dtype: int64 </code></pre>

Pandas drop duplicates within groupby [duplicate]

Tags:

python

pandas

This is my csv look like,

name, cuisine, review
A, Chinese, this
A, Indian, is
B, Indian, an
B, Indian, example
B, French, thank
C, French, you

I trying to count how many times the diff kind of cuisines appear by name. This is what I should be getting

Cuisine, Count
Chinese, 1
Indian, 2
French, 2

But as you can see there are duplicates within the name e.g. B so I try to drop_duplicates but I can't. I use

df.groupby('name')['cuisine'].drop_duplicates()

and it says series groupby object cannot.

Somehow I need to apply value_counts() to get the number of occurrences of the cuisine word but the duplicates thing is hindering. Any idea how I can get this in pandas? Thanks.

738

asked Nov 09 '18 03:11

Ah Sheng

2 Answers

You're looking for groupby and nunique:

df.groupby('cuisine', sort=False).name.nunique().to_frame('count')

         count
cuisine       
Chinese      1
Indian       2
French       2

Will return the count of unique items per group.

100

answered Oct 19 '22 05:10

cs95

Using crosstab

pd.crosstab(df.name,df.cuisine).ne(0).sum()
Out[550]: 
cuisine
 Chinese    1
 French     2
 Indian     2
dtype: int64

answered Oct 19 '22 07:10

BENY

Related questions
                            
                                Changing the np array does not change the Torch Tensor automatically?
                            
                                What is the purpose of decorators (why use them)?
                            
                                By default, which web server comes in django?
                            
                                Django: do not create migration when adding custom manager to auth.User
                            
                                make a truly deep copy of a pandas Series
                            
                                GroupBy Pandas Count Consecutive Zero's
                            
                                How can I prevent sentry from capturing events for some uncaught exceptions and logging messages?
                            
                                sorting python list based on "dependencies" from dictionary
                            
                                How to define color of specific cell in pandas dataframe based on integer position (e.g., df.iloc[1,1]) with df.style?
                            
                                Preserve variable names in summary from statsmodels
                            
                                Unable to stream frames from camera to QML
                            
                                Efficiently return the index of the first value satisfying condition in array
                            
                                How can I find the most frequent two-column combination in a dataframe in python
                            
                                List of maximum values of columns in a matrix (without Numpy)
                            
                                Django GIS : Using location__dwithin gives "Only numeric values of degree units are allowed" however location__distance_lte works fine
                            
                                Pandas merge_asof on multiple columns
                            
                                Order of sess.run([op1, op2...]) in Tensorflow
                            
                                How to get color image from point grey camera with Spinnaker in python?
                            
                                Python & Sqlalchemy - Connection pattern -> Disconnected from the remote server randomly
                            
                                add axis lines to matplotlib plot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With