I have a pandas data frame similar to: <pre class="prettyprint"><code>ColA ColB 1 1 1 1 1 1 1 2 1 2 2 1 3 2 </code></pre> I want an output that has the same function as Counter. I need to know how many time each row appears (with all of the columns being the same. In this case the proper output would be: <pre class="prettyprint"><code>ColA ColB Count 1 1 3 1 2 2 2 1 1 3 2 1 </code></pre> I have tried something of the sort: <pre class="prettyprint"><code>df.groupby(['ColA','ColB']).ColA.count() </code></pre> but this gives me some ugly output I am having trouble formatting

You can use <code>size</code> with <code>reset_index</code>: <pre class="prettyprint"><code>print df.groupby(['ColA','ColB']).size().reset_index(name='Count') ColA ColB Count 0 1 1 3 1 1 2 2 2 2 1 1 3 3 2 1 </code></pre>

I only needed to count the unique rows and have used the <code>DataFrame.drop_duplicates</code> alternative as below: <pre class="prettyprint lang-py prettyprint-override"><code>len(df[['ColA', 'ColB']].drop_duplicates()) </code></pre> It was twice as fast on my data than <code>len(df.groupby(['ColA', 'ColB']))</code>.

Pandas Counting Unique Rows

Tags:

python

pandas

python-2.7

counter

I have a pandas data frame similar to:

I want an output that has the same function as Counter. I need to know how many time each row appears (with all of the columns being the same.

In this case the proper output would be:

ColA ColB Count
1    1    3
1    2    2
2    1    1
3    2    1

I have tried something of the sort:

df.groupby(['ColA','ColB']).ColA.count()

but this gives me some ugly output I am having trouble formatting

860

asked Mar 15 '16 18:03

qwertylpc

2 Answers

You can use size with reset_index:

print df.groupby(['ColA','ColB']).size().reset_index(name='Count')
   ColA  ColB  Count
0     1     1      3
1     1     2      2
2     2     1      1
3     3     2      1

183

answered Sep 22 '22 23:09

jezrael

I only needed to count the unique rows and have used the DataFrame.drop_duplicates alternative as below:

len(df[['ColA', 'ColB']].drop_duplicates())

It was twice as fast on my data than len(df.groupby(['ColA', 'ColB'])).

answered Sep 22 '22 23:09

eddygeek

Related questions
                            
                                django 1.4 how to automatically get user's timezone from client
                            
                                What are the rules regarding chaining of "==" and "!=" in Python
                            
                                Python regex - why does end of string ($ and \Z) not work with group expressions?
                            
                                Matplotlib - Finance volume overlay
                            
                                How to structure data to easily build HTML tables in Flask
                            
                                How to count no of rows in table from web application using selenium python webdriver
                            
                                ndb to_dict method does not include object's key
                            
                                I'm not able to import Flask-WTF TextField and BooleanField
                            
                                Check if a function uses @classmethod
                            
                                In Tkinter, How I disable Entry?
                            
                                Can I POST data with python requests lib with http-gzip or deflate compression?
                            
                                Python - OSError: [WinError 17] The system cannot move the file to a different disk drive:
                            
                                Drawing multiple edges between two nodes with networkx
                            
                                Write dictionary values in an excel file
                            
                                How to calculate percentage with Pandas' DataFrame
                            
                                Pyplot: using percentage on x axis
                            
                                Nginx Django and Gunicorn. Gunicorn sock file is missing?
                            
                                How do I use within / in operator in a Pandas DataFrame? [duplicate]
                            
                                Install gdal using conda?
                            
                                Calculating cumulative returns with pandas dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With