Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas groupby sorting and concatenating

I have a panda data frame:

df = pd.DataFrame({'a': [1,1,1,1,2,2,2], 'b': ['a','a','a','a','b','b','b'], 'c': ['o','o','o','o','p','p','p'], 'd': [ [2,3,4], [1,3,3,4], [3,3,1,2], [4,1,2], [8,2,1], [0,9,1,2,3], [4,3,1] ], 'e': [13,12,5,10,3,2,5] })

What I want is:

First group by columns a, b, c --- there are two groups

Then sort within each group according to column e in an ascending order

Lastly concatenate within each group column d

So the result I want is:

result = pd.DataFrame({'a':[1,2], 'b':['a','b'], 'c':['o','p'], 'd':[[3,3,1,2,4,1,2,1,3,3,4,2,3,4],[0,9,1,2,3,8,2,1,4,3,1]]})

Could anyone share some quick/elegant ways to get around this? Thanks very much.

like image 709
shenglih Avatar asked Sep 29 '16 01:09

shenglih


1 Answers

You can sort by column e, group by a, b and c and then use a list comprehension to concatenate the d column (flatten it). Notice that we can use sort and then groupby since groupby will

preserve the order in which observations are sorted within each group:

according to the doc here:

(df.sort_values('e').groupby(['a', 'b', 'c'])['d']
                    .apply(lambda g: [j for i in g for j in i]).reset_index())

enter image description here

An alternative to list-comprehension is the chain from itertools:

from itertools import chain
(df.sort_values('e').groupby(['a', 'b', 'c'])['d']
                    .apply(lambda g: list(chain.from_iterable(g))).reset_index())
like image 78
Psidom Avatar answered Sep 19 '22 01:09

Psidom