Can pandas groupby aggregate into a list, rather than sum, mean, etc?

Tags:

I've had success using the groupby function to sum or average a given variable by groups, but is there a way to aggregate into a list of values, rather than to get a single result? (And would this still be called aggregation?)

I am not entirely sure this is the approach I should be taking anyhow, so below is an example of the transformation I'd like to make, with toy data.

That is, if the data look something like this:

    A    B    C       1    10   22     1    12   20     1    11   8     1    10   10     2    11   13     2    12   10      3    14   0

What I am trying to end up with is something like the following. I am not totally sure whether this can be done through groupby aggregating into lists, and am rather lost as to where to go from here.

Hypothetical output:

     A    B    C  New1  New2  New3  New4  New5  New6     1    10   22  12    20    11    8     10    10     2    11   13  12    10      3    14   0

Perhaps I should be pursuing pivots instead? The order by which the data are put into columns does not matter - all columns B through New6 in this example are equivalent. All suggestions/corrections are much appreciated.

533

asked Oct 23 '13 00:10

M.A.Kline

2 Answers

I used the following

grouped = df.groupby('A')  df = grouped.aggregate(lambda x: tuple(x))  df['grouped'] = df['B'] + df['C']

answered Oct 12 '22 22:10

user2623954

I am answering the question as stated in its title and first sentence: the following aggregates values to lists.

import pandas as pd  df = pd.DataFrame( {'A' : [1, 1, 1, 1, 2, 2, 3], 'B' : [10, 12, 11, 10, 11, 12, 14], 'C' : [22, 20,     8, 10, 13, 10, 0]}) print df  df2 = df.groupby('A').aggregate(lambda tdf: tdf.unique().tolist()) print df2 # Old version: # df2=df.groupby(['A']).apply(lambda tdf: pd.Series(  dict([[vv,tdf[vv].unique().tolist()] for vv in tdf if vv not in ['A']])  ))

The output is as follows:

In [3]: run tmp    A   B   C 0  1  10  22 1  1  12  20 2  1  11   8 3  1  10  10 4  2  11  13 5  2  12  10 6  3  14   0  [7 rows x 3 columns]               B                C A                                1  [10, 12, 11]  [22, 20, 8, 10] 2      [11, 12]         [13, 10] 3          [14]              [0]  [3 rows x 2 columns]

answered Oct 12 '22 21:10

CPBL

Related questions
                            
                                Python exception handling - line number
                            
                                How to append data to a json file?
                            
                                Plot inline or a separate window using Matplotlib in Spyder IDE
                            
                                Check if requirements are up to date
                            
                                Check for pending Django migrations
                            
                                Browse files and subfolders in Python
                            
                                Is it possible to have an actual memory leak in Python because of your code?
                            
                                Using Python's ftplib to get a directory listing, portably
                            
                                Temporarily Redirect stdout/stderr
                            
                                Format string unused named arguments [duplicate]
                            
                                How to edit a seaborn legend title and labels for figure-level functions
                            
                                Clone an image in cv2 python
                            
                                Logarithmic returns in pandas dataframe
                            
                                Is everything an object in Python like Ruby?
                            
                                Django Template Ternary Operator
                            
                                pip3: bad interpreter: No such file or directory
                            
                                Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,
                            
                                Python: Removing spaces from list objects [duplicate]
                            
                                return column names from pyodbc execute() statement
                            
                                Error running basic tensorflow example

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can pandas groupby aggregate into a list, rather than sum, mean, etc?

Tags:

python

pandas

M.A.Kline

People also ask

2 Answers

user2623954

CPBL

Recent Activity

Donate For Us