Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using pandas groupby().apply(list) on multiple columns at once [duplicate]

I'm trying to combine multiple rows of a dataframe into one row, with the columns with different values being combined in a list. There are multiple columns with different values.

The df.groupby('a')['b'].apply(list) works well if only 1 column ('b' in this instance) has to be made to a list, but I can't figure out how to do it for multiple columns.

Dataframe:

   a  b  c       d
0  1  b  1   first
1  1  b  2  second
2  2  c  1   third
3  2  c  2  fourth
4  2  c  3   fifth

Prefered dataframe post operation:

   a  b          c                       d
0  1  b     [1, 2]         [first, second]
1  2  c  [1, 2, 3]  [third, fourth, fifth]

Is there an easy way to do this?

like image 774
MvR Avatar asked May 13 '19 09:05

MvR


People also ask

Can Groupby be used for multiple columns in pandas?

groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.

What is possible using Groupby () method of pandas?

groupby() can accept several different arguments: A column or list of columns. A dict or pandas Series. A NumPy array or pandas Index , or an array-like iterable of these.

Can pandas apply return multiple columns?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.


1 Answers

df = df.groupby(['a','b']).apply(lambda x: [list(x['c']), list(x['d'])]).apply(pd.Series)
df.columns =['a','b','c','d']

Output

   a  b          c                       d
0  1  b     [1, 2]         [first, second]
1  2  c  [1, 2, 3]  [third, fourth, fifth]
like image 141
iamklaus Avatar answered Sep 22 '22 19:09

iamklaus