Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I create column where each row is a running list in a Pandas data frame using groupby?

Imagine I have a Pandas DataFrame:

# create df
df = pd.DataFrame({'id': [1,1,1,2,2,2],
                   'val': [5,4,6,3,2,3]})

Lets assume it is ordered by 'id' and an imaginary, not shown, date column (ascending). I want to create another column where each row is a list of 'val' at that date.

The ending DataFrame will look like this:

df = pd.DataFrame({'id': [1,1,1,2,2,2],
                   'val': [5,4,6,3,2,3],
                   'val_list': [[5],[5,4],[5,4,6],[3],[3,2],[3,2,3]]})

I don't want to use a loop because the actual df I am working with has about 4 million records. I am imagining I would use a lambda function in conjunction with groupby (something like this):

df['val_list'] = df.groupby('id')['val'].apply(lambda x: x.runlist())

This raises an AttributError because the runlist() method does not exist, but I am thinking the solution would be something like this.

Does anyone know what to do to solve this problem?

like image 661
Aaron England Avatar asked Jan 24 '23 15:01

Aaron England


1 Answers

Let us try

df['new'] = df.val.map(lambda x : [x]).groupby(df.id).apply(lambda x : x.cumsum())
Out[138]: 
0          [5]
1       [5, 4]
2    [5, 4, 6]
3          [3]
4       [3, 2]
5    [3, 2, 3]
Name: val, dtype: object
like image 183
BENY Avatar answered Jan 30 '23 07:01

BENY