Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby multiple columns, list of multiple columns

I have the following data:

Invoice NoStockCode Description                         Quantity    CustomerID  Country
536365  85123A      WHITE HANGING HEART T-LIGHT HOLDER  6           17850       United Kingdom
536365  71053       WHITE METAL LANTERN                 6           17850       United Kingdom
536365  84406B      CREAM CUPID HEARTS COAT HANGER      8           17850       United Kingdom

I am trying to do a groupby so i have the following operation:

df.groupby(['InvoiceNo','CustomerID','Country'])['NoStockCode','Description','Quantity'].apply(list)

I want to get the output

|Invoice |CustomerID |Country        |NoStockCode              |Description                                                                                 |Quantity       
|536365| |17850      |United Kingdom |85123A, 71053, 84406B    |WHITE HANGING HEART T-LIGHT HOLDER, WHITE METAL LANTERN, CREAM CUPID HEARTS COAT HANGER     |6, 6, 8            

Instead I get:

|Invoice |CustomerID |Country        |0         
|536365| |17850      |United Kingdom |['NoStockCode','Description','Quantity']

I have tried agg and other methods, but I haven't been able to get all of the columns to join as a list. I don't need to use the list function, but in the end I want the different columns to be lists.

like image 488
GrandmasLove Avatar asked Jul 29 '18 20:07

GrandmasLove


People also ask

How do you group by and sum multiple columns in pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.

How do I group multiple columns?

A shortcut way to group rows or columns is to highlight the rows/columns you wish to group and use ALT+SHIFT+RIGHT ARROW to group the rows/columns, and ALT+SHIFT+LEFT ARROW to ungroup them. You can go multiple levels as well (so you could group rows 1-30, and then group rows 20-25 as a subgroup of the first).


1 Answers

I can't reproduce your code right now, but I think that:

print (df.groupby(['InvoiceNo','CustomerID','Country'], 
                  as_index=False)['NoStockCode','Description','Quantity']
          .agg(lambda x: list(x)))

would give you the expected output

like image 142
Ben.T Avatar answered Sep 28 '22 03:09

Ben.T