I have a panda data frame:
df = pd.DataFrame({'a': [1,1,1,1,2,2,2], 'b': ['a','a','a','a','b','b','b'], 'c': ['o','o','o','o','p','p','p'], 'd': [ [2,3,4], [1,3,3,4], [3,3,1,2], [4,1,2], [8,2,1], [0,9,1,2,3], [4,3,1] ], 'e': [13,12,5,10,3,2,5] })
What I want is:
First group by columns a, b, c --- there are two groups
Then sort within each group according to column e in an ascending order
Lastly concatenate within each group column d
So the result I want is:
result = pd.DataFrame({'a':[1,2], 'b':['a','b'], 'c':['o','p'], 'd':[[3,3,1,2,4,1,2,1,3,3,4,2,3,4],[0,9,1,2,3,8,2,1,4,3,1]]})
Could anyone share some quick/elegant ways to get around this? Thanks very much.
You can sort by column e
, group by a
, b
and c
and then use a list comprehension to concatenate the d
column (flatten it). Notice that we can use sort
and then groupby
since groupby will
preserve the order in which observations are sorted within each group:
according to the doc here:
(df.sort_values('e').groupby(['a', 'b', 'c'])['d']
.apply(lambda g: [j for i in g for j in i]).reset_index())
An alternative to list-comprehension is the chain from itertools:
from itertools import chain
(df.sort_values('e').groupby(['a', 'b', 'c'])['d']
.apply(lambda g: list(chain.from_iterable(g))).reset_index())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With