returning aggregated dataframe from pandas groupby

Tags:

I'm trying to wrap my head around Pandas groupby methods. I'd like to write a function that does some aggregation functions and then returns a Pandas DataFrame. Here's a grossly simplified example using sum(). I know there are easier ways to do simple sums, in real life my function is more complex:

import pandas as pd
df = pd.DataFrame({'col1': ['A', 'A', 'B', 'B'], 'col2':[1.0, 2, 3, 4]})

In [3]: df
Out[3]: 
  col1  col2
0    A     1
1    A     2
2    B     3
3    B     4

def func2(df):
    dfout = pd.DataFrame({ 'col1' : df['col1'].unique() ,
                           'someData': sum(df['col2']) })
    return  dfout

t = df.groupby('col1').apply(func2)

In [6]: t
Out[6]: 
       col1  someData
col1                 
A    0    A         3
B    0    B         7

I did not expect to have col1 in there twice nor did I expect that mystery index looking thing. I really thought I would just get col1 & someData.

In my real life application I'm grouping by more than one column and really would like to get back a DataFrame and not a Series object.
Any ideas for a solution or an explanation on what Pandas is doing in my example above?

----- added info -----

I should have started with this example, I think:

In [13]: import pandas as pd

In [14]: df = pd.DataFrame({'col1':['A','A','A','B','B','B'], 'col2':['C','D','D','D','C','C'], 'col3':[.1,.2,.4,.6,.8,1]})

In [15]: df
Out[15]: 
  col1 col2  col3
0    A    C   0.1
1    A    D   0.2
2    A    D   0.4
3    B    D   0.6
4    B    C   0.8
5    B    C   1.0

In [16]: def func3(df):
   ....:         dfout =  sum(df['col3']**2)
   ....:         return  dfout
   ....: 

In [17]: t = df.groupby(['col1', 'col2']).apply(func3)

In [18]: t
Out[18]: 
col1  col2
A     C       0.01
      D       0.20
B     C       1.64
      D       0.36

In the above illustration the result of the apply() function is a Pandas Series. And it lacks the groupby columns from the df.groupby. The essence of what I'm struggling with is how do I create a function which I apply to a groupby which returns both the result of the function AND the columns on which it was grouped?

----- yet another update ------

It appears that if I then do this:

 pd.DataFrame(t).reset_index()

I get back a dataframe which is really close to what I was after.

863

asked Feb 21 '13 13:02

JD Long

1 Answers

The reason you are seeing the columns with 0s is because the output of .unique() is an array.

The best way to understand how your apply is going to work is to inspect each action group-wise:

In [11] :g = df.groupby('col1')

In [12]: g.get_group('A')
Out[12]: 
  col1  col2
0    A     1
1    A     2

In [13]: g.get_group('A')['col1'].unique()
Out[13]: array([A], dtype=object)

In [14]: sum(g.get_group('A')['col2'])
Out[14]: 3.0

The majority of the time you want this to be an aggregated value.

The output of grouped.apply will always have the group labels as an index (the unique values of 'col1'), so your example construction of col1 seems a little obtuse to me.

Note: To pop 'col1' (the index) back to a column you can call reset_index, so in this case.

In [15]: g.sum().reset_index()
Out[15]: 
  col1  col2
0    A     3
1    B     7

121

answered Oct 07 '22 07:10

Andy Hayden

Related questions
                            
                                Multiple sessions and graphs in Tensorflow (in the same process)
                            
                                pyGame full core usage in simple loop
                            
                                Conda: Choose where packages are downloaded
                            
                                Understanding Pycharm's profiler's results vs. cProfile results and how to get more detail on standard library functions
                            
                                Training a tf.keras model with a basic low-level TensorFlow training loop doesn't work
                            
                                How to efficiently use asyncio when calling a method on a BaseProxy?
                            
                                PyQt vs PySide comparison [closed]
                            
                                How to delete a record from table?
                            
                                What are some good ways of estimating 'approximate' semantic similarity between sentences?
                            
                                Define remote interpreter on remote Linux machine using Pydev and RSE Server
                            
                                Jinja2: How to use named blocks inside included templates, inside extendable template
                            
                                How to perform a chi-squared goodness of fit test using scientific libraries in Python?
                            
                                Compute the gradient of the SVM loss function
                            
                                Sampling n= 2000 from a Dask Dataframe of len 18000 generates error Cannot take a larger sample than population when 'replace=False'
                            
                                Interactive matplotlib using ipywidgets
                            
                                Where are the gains using numba coming from for pure numpy code?
                            
                                Cache Julia module for faster startup and usage in Python
                            
                                Alter namespace prefixing with ElementTree in Python
                            
                                Which Python client library should I use for CouchdB? [closed]
                            
                                Hot-swapping of Python running program

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

returning aggregated dataframe from pandas groupby

Tags:

python

pandas

group-by

JD Long

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us