I'm trying to combine multiple rows of a dataframe into one row, with the columns with different values being combined in a list. There are multiple columns with different values.
The df.groupby('a')['b'].apply(list)
works well if only 1 column ('b' in this instance) has to be made to a list, but I can't figure out how to do it for multiple columns.
Dataframe:
a b c d
0 1 b 1 first
1 1 b 2 second
2 2 c 1 third
3 2 c 2 fourth
4 2 c 3 fifth
Prefered dataframe post operation:
a b c d
0 1 b [1, 2] [first, second]
1 2 c [1, 2, 3] [third, fourth, fifth]
Is there an easy way to do this?
groupby() can take the list of columns to group by multiple columns and use the aggregate functions to apply single or multiple aggregations at the same time.
groupby() can accept several different arguments: A column or list of columns. A dict or pandas Series. A NumPy array or pandas Index , or an array-like iterable of these.
Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.
df = df.groupby(['a','b']).apply(lambda x: [list(x['c']), list(x['d'])]).apply(pd.Series)
df.columns =['a','b','c','d']
Output
a b c d
0 1 b [1, 2] [first, second]
1 2 c [1, 2, 3] [third, fourth, fifth]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With