Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can pandas groupby aggregate into a list, rather than sum, mean, etc?

Tags:

python

pandas

I've had success using the groupby function to sum or average a given variable by groups, but is there a way to aggregate into a list of values, rather than to get a single result? (And would this still be called aggregation?)

I am not entirely sure this is the approach I should be taking anyhow, so below is an example of the transformation I'd like to make, with toy data.

That is, if the data look something like this:

    A    B    C       1    10   22     1    12   20     1    11   8     1    10   10     2    11   13     2    12   10      3    14   0 

What I am trying to end up with is something like the following. I am not totally sure whether this can be done through groupby aggregating into lists, and am rather lost as to where to go from here.

Hypothetical output:

     A    B    C  New1  New2  New3  New4  New5  New6     1    10   22  12    20    11    8     10    10     2    11   13  12    10      3    14   0 

Perhaps I should be pursuing pivots instead? The order by which the data are put into columns does not matter - all columns B through New6 in this example are equivalent. All suggestions/corrections are much appreciated.

like image 533
M.A.Kline Avatar asked Oct 23 '13 00:10

M.A.Kline


People also ask

What does aggregate in pandas do?

Pandas DataFrame aggregate() Method The aggregate() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis. Note: the agg() method is an alias of the aggregate() method.


2 Answers

I used the following

grouped = df.groupby('A')  df = grouped.aggregate(lambda x: tuple(x))  df['grouped'] = df['B'] + df['C'] 
like image 69
user2623954 Avatar answered Oct 12 '22 22:10

user2623954


I am answering the question as stated in its title and first sentence: the following aggregates values to lists.

import pandas as pd  df = pd.DataFrame( {'A' : [1, 1, 1, 1, 2, 2, 3], 'B' : [10, 12, 11, 10, 11, 12, 14], 'C' : [22, 20,     8, 10, 13, 10, 0]}) print df  df2 = df.groupby('A').aggregate(lambda tdf: tdf.unique().tolist()) print df2 # Old version: # df2=df.groupby(['A']).apply(lambda tdf: pd.Series(  dict([[vv,tdf[vv].unique().tolist()] for vv in tdf if vv not in ['A']])  ))  

The output is as follows:

In [3]: run tmp    A   B   C 0  1  10  22 1  1  12  20 2  1  11   8 3  1  10  10 4  2  11  13 5  2  12  10 6  3  14   0  [7 rows x 3 columns]               B                C A                                1  [10, 12, 11]  [22, 20, 8, 10] 2      [11, 12]         [13, 10] 3          [14]              [0]  [3 rows x 2 columns] 
like image 31
CPBL Avatar answered Oct 12 '22 21:10

CPBL