Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert groupby values into list of arrays [duplicate]

Tags:

python

pandas

Here's a sample dataframe:

label  data
a      1.09
b      2.1
a      5.0
b      2.0
c      1.9

What I want is

arr = [[1.09, 5.0], [2.1, 2.0],[1.9]]

preferably as a list of numpy arrays.

I know that df.groupby.groups.keys() gives me the list ['a','b','c'], and df.groupby.groups.values() gives me something like arr, but as an Int64Index object. However, I tried df.loc[df.groupby.groups.values()]['label'] and it isn't getting the desired result.

How do I accomplish this? Thanks!

like image 341
irene Avatar asked Jun 21 '18 06:06

irene


People also ask

How do I turn a Groupby object into a list?

groupby() To Group Rows into List. By using DataFrame. gropby() function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply(list).

Does Groupby return a DataFrame or series?

So a groupby() operation can downcast to a Series, or if given a Series as input, can upcast to dataframe. For your first dataframe, you run unequal groupings (or unequal index lengths) coercing a series return which in the "combine" processing does not adequately yield a data frame.

Does Groupby preserve order python?

Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.


1 Answers

preferably as a list of numpy arrays.

Preferably not, because you're asking for ragged arrays, which means that the inner arrays (AKA, the rows) are not all of the same length. This is inconvenient for numpy, meaning it cannot store these arrays efficiently as C arrays internally. It ends up falling back to slow python objects.

In this situation, I'd recommend nested python lists. That's achievable through a groupby + apply.

lst = df.groupby('label')['data'].apply(pd.Series.tolist).tolist()
print(lst)
[[1.09, 5.0], [2.1, 2.0], [1.9]]
like image 159
cs95 Avatar answered Sep 30 '22 10:09

cs95