I'm starting with input data like this <pre class="prettyprint"><code>df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } ) </code></pre> Which when printed appears as this: <pre class="prettyprint"><code> City Name 0 Seattle Alice 1 Seattle Bob 2 Portland Mallory 3 Seattle Mallory 4 Seattle Bob 5 Portland Mallory </code></pre> Grouping is simple enough: <pre class="prettyprint"><code>g1 = df1.groupby( [ "Name", "City"] ).count() </code></pre> and printing yields a <code>GroupBy</code> object: <pre class="prettyprint"><code> City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Seattle 1 1 </code></pre> But what I want eventually is another DataFrame object that contains all the rows in the GroupBy object. In other words I want to get the following result: <pre class="prettyprint"><code> City Name Name City Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 2 Mallory Seattle 1 1 </code></pre> I can't quite see how to accomplish this in the pandas documentation. Any hints would be welcome.

<code>g1</code> here is a DataFrame. It has a hierarchical index, though: <pre class="prettyprint"><code>In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame In [20]: g1.index Out[20]: MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'), ('Mallory', 'Seattle')], dtype=object) </code></pre> Perhaps you want something like this? <pre class="prettyprint"><code>In [21]: g1.add_suffix('_Count').reset_index() Out[21]: Name City City_Count Name_Count 0 Alice Seattle 1 1 1 Bob Seattle 2 2 2 Mallory Portland 2 2 3 Mallory Seattle 1 1 </code></pre> Or something like: <pre class="prettyprint"><code>In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index() Out[36]: Name City count 0 Alice Seattle 1 1 Bob Seattle 2 2 Mallory Portland 2 3 Mallory Seattle 1 </code></pre>

Converting a Pandas GroupBy output from Series to DataFrame

Tags:

python

pandas

dataframe

pandas-groupby

multi-index

I'm starting with input data like this

df1 = pandas.DataFrame( {      "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,      "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

Which when printed appears as this:

   City     Name 0   Seattle    Alice 1   Seattle      Bob 2  Portland  Mallory 3   Seattle  Mallory 4   Seattle      Bob 5  Portland  Mallory

Grouping is simple enough:

g1 = df1.groupby( [ "Name", "City"] ).count()

and printing yields a GroupBy object:

                  City  Name Name    City Alice   Seattle      1     1 Bob     Seattle      2     2 Mallory Portland     2     2         Seattle      1     1

But what I want eventually is another DataFrame object that contains all the rows in the GroupBy object. In other words I want to get the following result:

                  City  Name Name    City Alice   Seattle      1     1 Bob     Seattle      2     2 Mallory Portland     2     2 Mallory Seattle      1     1

I can't quite see how to accomplish this in the pandas documentation. Any hints would be welcome.

557

asked Apr 29 '12 16:04

saveenr

2 Answers

g1 here is a DataFrame. It has a hierarchical index, though:

In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame  In [20]: g1.index Out[20]:  MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'),        ('Mallory', 'Seattle')], dtype=object)

Perhaps you want something like this?

In [21]: g1.add_suffix('_Count').reset_index() Out[21]:        Name      City  City_Count  Name_Count 0    Alice   Seattle           1           1 1      Bob   Seattle           2           2 2  Mallory  Portland           2           2 3  Mallory   Seattle           1           1

Or something like:

In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index() Out[36]:        Name      City  count 0    Alice   Seattle      1 1      Bob   Seattle      2 2  Mallory  Portland      2 3  Mallory   Seattle      1

answered Sep 22 '22 02:09

Wes McKinney

I want to slightly change the answer given by Wes, because version 0.16.2 requires as_index=False. If you don't set it, you get an empty dataframe.

Source:

Aggregation functions will not return the groups that you are aggregating over if they are named columns, when as_index=True, the default. The grouped columns will be the indices of the returned object.

Passing as_index=False will return the groups that you are aggregating over, if they are named columns.

Aggregating functions are ones that reduce the dimension of the returned objects, for example: mean, sum, size, count, std, var, sem, describe, first, last, nth, min, max. This is what happens when you do for example DataFrame.sum() and get back a Series.

nth can act as a reducer or a filter, see here.

import pandas as pd  df1 = pd.DataFrame({"Name":["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"],                     "City":["Seattle","Seattle","Portland","Seattle","Seattle","Portland"]}) print df1 # #       City     Name #0   Seattle    Alice #1   Seattle      Bob #2  Portland  Mallory #3   Seattle  Mallory #4   Seattle      Bob #5  Portland  Mallory # g1 = df1.groupby(["Name", "City"], as_index=False).count() print g1 # #                  City  Name #Name    City #Alice   Seattle      1     1 #Bob     Seattle      2     2 #Mallory Portland     2     2 #        Seattle      1     1 #

EDIT:

In version 0.17.1 and later you can use subset in count and reset_index with parameter name in size:

print df1.groupby(["Name", "City"], as_index=False ).count() #IndexError: list index out of range  print df1.groupby(["Name", "City"]).count() #Empty DataFrame #Columns: [] #Index: [(Alice, Seattle), (Bob, Seattle), (Mallory, Portland), (Mallory, Seattle)]  print df1.groupby(["Name", "City"])[['Name','City']].count() #                  Name  City #Name    City                 #Alice   Seattle      1     1 #Bob     Seattle      2     2 #Mallory Portland     2     2 #        Seattle      1     1  print df1.groupby(["Name", "City"]).size().reset_index(name='count') #      Name      City  count #0    Alice   Seattle      1 #1      Bob   Seattle      2 #2  Mallory  Portland      2 #3  Mallory   Seattle      1

The difference between count and size is that size counts NaN values while count does not.

answered Sep 21 '22 02:09

jezrael

Related questions
                            
                                How to read a large file - line by line?
                            
                                How to get all possible combinations of a list’s elements?
                            
                                Configure Flask dev server to be visible across the network
                            
                                What is the common header format of Python files?
                            
                                mysql_config not found when installing mysqldb python interface
                            
                                What is the Python equivalent for a case/switch statement? [duplicate]
                            
                                How to use StringIO in Python3?
                            
                                CSV file written with Python has blank lines between each row
                            
                                "is" operator behaves unexpectedly with integers
                            
                                Removing pip's cache?
                            
                                String comparison in Python: is vs. == [duplicate]
                            
                                Get HTML source of WebElement in Selenium WebDriver using Python
                            
                                Difference between map, applymap and apply methods in Pandas
                            
                                Sort a list by multiple attributes?
                            
                                What is the most "pythonic" way to iterate over a list in chunks?
                            
                                Python exit commands - why so many and when should each be used?
                            
                                How do I convert seconds to hours, minutes and seconds?
                            
                                Finding the average of a list
                            
                                How to state in requirements.txt a direct github source
                            
                                Does Python have an ordered set?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With