I have the following data:
Invoice NoStockCode Description Quantity CustomerID Country
536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 17850 United Kingdom
536365 71053 WHITE METAL LANTERN 6 17850 United Kingdom
536365 84406B CREAM CUPID HEARTS COAT HANGER 8 17850 United Kingdom
I am trying to do a groupby so i have the following operation:
df.groupby(['InvoiceNo','CustomerID','Country'])['NoStockCode','Description','Quantity'].apply(list)
I want to get the output
|Invoice |CustomerID |Country |NoStockCode |Description |Quantity
|536365| |17850 |United Kingdom |85123A, 71053, 84406B |WHITE HANGING HEART T-LIGHT HOLDER, WHITE METAL LANTERN, CREAM CUPID HEARTS COAT HANGER |6, 6, 8
Instead I get:
|Invoice |CustomerID |Country |0
|536365| |17850 |United Kingdom |['NoStockCode','Description','Quantity']
I have tried agg and other methods, but I haven't been able to get all of the columns to join as a list. I don't need to use the list function, but in the end I want the different columns to be lists.
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
A shortcut way to group rows or columns is to highlight the rows/columns you wish to group and use ALT+SHIFT+RIGHT ARROW to group the rows/columns, and ALT+SHIFT+LEFT ARROW to ungroup them. You can go multiple levels as well (so you could group rows 1-30, and then group rows 20-25 as a subgroup of the first).
I can't reproduce your code right now, but I think that:
print (df.groupby(['InvoiceNo','CustomerID','Country'],
as_index=False)['NoStockCode','Description','Quantity']
.agg(lambda x: list(x)))
would give you the expected output
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With