I've had success using the groupby function to sum or average a given variable by groups, but is there a way to aggregate into a list of values, rather than to get a single result? (And would this still be called aggregation?)
I am not entirely sure this is the approach I should be taking anyhow, so below is an example of the transformation I'd like to make, with toy data.
That is, if the data look something like this:
A B C 1 10 22 1 12 20 1 11 8 1 10 10 2 11 13 2 12 10 3 14 0
What I am trying to end up with is something like the following. I am not totally sure whether this can be done through groupby aggregating into lists, and am rather lost as to where to go from here.
Hypothetical output:
A B C New1 New2 New3 New4 New5 New6 1 10 22 12 20 11 8 10 10 2 11 13 12 10 3 14 0
Perhaps I should be pursuing pivots instead? The order by which the data are put into columns does not matter - all columns B through New6 in this example are equivalent. All suggestions/corrections are much appreciated.
Pandas DataFrame aggregate() Method The aggregate() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis. Note: the agg() method is an alias of the aggregate() method.
I used the following
grouped = df.groupby('A') df = grouped.aggregate(lambda x: tuple(x)) df['grouped'] = df['B'] + df['C']
I am answering the question as stated in its title and first sentence: the following aggregates values to lists.
import pandas as pd df = pd.DataFrame( {'A' : [1, 1, 1, 1, 2, 2, 3], 'B' : [10, 12, 11, 10, 11, 12, 14], 'C' : [22, 20, 8, 10, 13, 10, 0]}) print df df2 = df.groupby('A').aggregate(lambda tdf: tdf.unique().tolist()) print df2 # Old version: # df2=df.groupby(['A']).apply(lambda tdf: pd.Series( dict([[vv,tdf[vv].unique().tolist()] for vv in tdf if vv not in ['A']]) ))
The output is as follows:
In [3]: run tmp A B C 0 1 10 22 1 1 12 20 2 1 11 8 3 1 10 10 4 2 11 13 5 2 12 10 6 3 14 0 [7 rows x 3 columns] B C A 1 [10, 12, 11] [22, 20, 8, 10] 2 [11, 12] [13, 10] 3 [14] [0] [3 rows x 2 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With