I'm sure this has been asked before, sorry if duplicate. Suppose I have the following dataframe:
df = pd.DataFrame({'key': ['A', 'B', 'C', 'A', 'B', 'C'],
'data': range(6)}, columns=['key', 'data'])
>>
key data
0 A 0
1 B 1
2 C 2
3 A 3
4 B 4
5 C 5
Doing a groupby on 'key', df.groupby('key').sum()
I know we can do things like:
>>
data
key
A 3
B 5
C 7
What is the easiest way to get all the 'splitted' data in an array?:
>>
data
key
A [0, 3]
B [1, 4]
C [2, 5]
I'm not necessarily grouping by just one key, but with several other indexes as well ('year' and 'month' for example) which is why I'd like to use the groupby function, but preserve all the grouped values in an array.
When the series are of different lengths, it returns a multi-indexed series. This returns a a Series object. However, if every series has the same length, then it pivots this into a DataFrame .
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
You can use apply(list)
:
print(df.groupby('key').data.apply(list).reset_index())
key data
0 A [0, 3]
1 B [1, 4]
2 C [2, 5]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With