I'm starting with input data like this
df1 = pandas.DataFrame( {      "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,      "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )   Which when printed appears as this:
   City     Name 0   Seattle    Alice 1   Seattle      Bob 2  Portland  Mallory 3   Seattle  Mallory 4   Seattle      Bob 5  Portland  Mallory   Grouping is simple enough:
g1 = df1.groupby( [ "Name", "City"] ).count()   and printing yields a GroupBy object:
                  City  Name Name    City Alice   Seattle      1     1 Bob     Seattle      2     2 Mallory Portland     2     2         Seattle      1     1   But what I want eventually is another DataFrame object that contains all the rows in the GroupBy object. In other words I want to get the following result:
                  City  Name Name    City Alice   Seattle      1     1 Bob     Seattle      2     2 Mallory Portland     2     2 Mallory Seattle      1     1   I can't quite see how to accomplish this in the pandas documentation. Any hints would be welcome.
to_frame() function is used to convert the given series object to a dataframe. Parameter : name : The passed name should substitute for the series name (if it has one).
groupby() To Group Rows into List. By using DataFrame. gropby() function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply(list).
g1 here is a DataFrame. It has a hierarchical index, though:
In [19]: type(g1) Out[19]: pandas.core.frame.DataFrame  In [20]: g1.index Out[20]:  MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'),        ('Mallory', 'Seattle')], dtype=object)   Perhaps you want something like this?
In [21]: g1.add_suffix('_Count').reset_index() Out[21]:        Name      City  City_Count  Name_Count 0    Alice   Seattle           1           1 1      Bob   Seattle           2           2 2  Mallory  Portland           2           2 3  Mallory   Seattle           1           1   Or something like:
In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index() Out[36]:        Name      City  count 0    Alice   Seattle      1 1      Bob   Seattle      2 2  Mallory  Portland      2 3  Mallory   Seattle      1 
                        I want to slightly change the answer given by Wes, because version 0.16.2 requires as_index=False. If you don't set it, you get an empty dataframe.
Source:
Aggregation functions will not return the groups that you are aggregating over if they are named columns, when
as_index=True, the default. The grouped columns will be the indices of the returned object.Passing
as_index=Falsewill return the groups that you are aggregating over, if they are named columns.Aggregating functions are ones that reduce the dimension of the returned objects, for example:
mean,sum,size,count,std,var,sem,describe,first,last,nth,min,max. This is what happens when you do for exampleDataFrame.sum()and get back aSeries.nth can act as a reducer or a filter, see here.
import pandas as pd  df1 = pd.DataFrame({"Name":["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"],                     "City":["Seattle","Seattle","Portland","Seattle","Seattle","Portland"]}) print df1 # #       City     Name #0   Seattle    Alice #1   Seattle      Bob #2  Portland  Mallory #3   Seattle  Mallory #4   Seattle      Bob #5  Portland  Mallory # g1 = df1.groupby(["Name", "City"], as_index=False).count() print g1 # #                  City  Name #Name    City #Alice   Seattle      1     1 #Bob     Seattle      2     2 #Mallory Portland     2     2 #        Seattle      1     1 #  EDIT:
In version 0.17.1 and later you can use subset in count and reset_index with parameter name in size:
print df1.groupby(["Name", "City"], as_index=False ).count() #IndexError: list index out of range  print df1.groupby(["Name", "City"]).count() #Empty DataFrame #Columns: [] #Index: [(Alice, Seattle), (Bob, Seattle), (Mallory, Portland), (Mallory, Seattle)]  print df1.groupby(["Name", "City"])[['Name','City']].count() #                  Name  City #Name    City                 #Alice   Seattle      1     1 #Bob     Seattle      2     2 #Mallory Portland     2     2 #        Seattle      1     1  print df1.groupby(["Name", "City"]).size().reset_index(name='count') #      Name      City  count #0    Alice   Seattle      1 #1      Bob   Seattle      2 #2  Mallory  Portland      2 #3  Mallory   Seattle      1  The difference between count and size is that size counts NaN values while count does not.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With