given a dataframe that logs uses of some books like this: <pre class="prettyprint"><code>Name Type ID Book1 ebook 1 Book2 paper 2 Book3 paper 3 Book1 ebook 1 Book2 paper 2 </code></pre> I need to get the count of all the books, keeping the other columns and get this: <pre class="prettyprint"><code>Name Type ID Count Book1 ebook 1 2 Book2 paper 2 2 Book3 paper 3 1 </code></pre> How can this be done? Thanks!

You want the following: <pre class="prettyprint"><code>In [20]: df.groupby(['Name','Type','ID']).count().reset_index() Out[20]: Name Type ID Count 0 Book1 ebook 1 2 1 Book2 paper 2 2 2 Book3 paper 3 1 </code></pre> In your case the 'Name', 'Type' and 'ID' cols match in values so we can <code>groupby</code> on these, call <code>count</code> and then <code>reset_index</code>. An alternative approach would be to add the 'Count' column using <code>transform</code> and then call <code>drop_duplicates</code>: <pre class="prettyprint"><code>In [25]: df['Count'] = df.groupby(['Name'])['ID'].transform('count') df.drop_duplicates() Out[25]: Name Type ID Count 0 Book1 ebook 1 2 1 Book2 paper 2 2 2 Book3 paper 3 1 </code></pre>

I think as_index=False should do the trick. <pre class="prettyprint"><code>df.groupby(['Name','Type','ID'], as_index=False).count() </code></pre>

How to GroupBy a Dataframe in Pandas and keep Columns

Tags:

python

pandas

given a dataframe that logs uses of some books like this:

Name   Type   ID Book1  ebook  1 Book2  paper  2 Book3  paper  3 Book1  ebook  1 Book2  paper  2

I need to get the count of all the books, keeping the other columns and get this:

Name   Type   ID    Count Book1  ebook  1     2 Book2  paper  2     2 Book3  paper  3     1

How can this be done?

Thanks!

706

asked Jul 22 '15 17:07

Adrian Ribao

2 Answers

You want the following:

In [20]: df.groupby(['Name','Type','ID']).count().reset_index()  Out[20]:     Name   Type  ID  Count 0  Book1  ebook   1      2 1  Book2  paper   2      2 2  Book3  paper   3      1

In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby on these, call count and then reset_index.

An alternative approach would be to add the 'Count' column using transform and then call drop_duplicates:

In [25]: df['Count'] = df.groupby(['Name'])['ID'].transform('count') df.drop_duplicates()  Out[25]:     Name   Type  ID  Count 0  Book1  ebook   1      2 1  Book2  paper   2      2 2  Book3  paper   3      1

164

answered Sep 29 '22 13:09

EdChum

I think as_index=False should do the trick.

df.groupby(['Name','Type','ID'], as_index=False).count()

answered Sep 29 '22 12:09

jpobst

Related questions
                            
                                How to create a Python dictionary with double quotes as default quote format?
                            
                                pandas: complex filter on rows of DataFrame
                            
                                How are deques in Python implemented, and when are they worse than lists?
                            
                                Why does CSV file contain a blank line in between each data line when outputting with Dictwriter in Python [duplicate]
                            
                                How can I get around declaring an unused variable in a for loop?
                            
                                Using python Logging with AWS Lambda
                            
                                Is there an easy way to make sessions timeout in flask?
                            
                                Writelines writes lines without newline, Just fills the file
                            
                                How to access the user profile in a Django template?
                            
                                Get the list of packages installed in Anaconda
                            
                                How to animate a scatter plot
                            
                                Python - Dimension of Data Frame
                            
                                MatPlotLib: Multiple datasets on the same scatter plot
                            
                                "Line contains NULL byte" in CSV reader (Python)
                            
                                How to get the indices list of all NaN value in numpy array?
                            
                                Directing print output to a .txt file
                            
                                Extract points/coordinates from a polygon in Shapely
                            
                                Pass extra arguments to Serializer Class in Django Rest Framework
                            
                                How do I render jinja2 output to a file in Python instead of a Browser
                            
                                How do I request and process JSON with python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With