Here's a sample dataframe:
label data
a 1.09
b 2.1
a 5.0
b 2.0
c 1.9
What I want is
arr = [[1.09, 5.0], [2.1, 2.0],[1.9]]
preferably as a list of numpy arrays.
I know that df.groupby.groups.keys()
gives me the list ['a','b','c']
, and df.groupby.groups.values()
gives me something like arr
, but as an Int64Index
object. However, I tried df.loc[df.groupby.groups.values()]['label']
and it isn't getting the desired result.
How do I accomplish this? Thanks!
groupby() To Group Rows into List. By using DataFrame. gropby() function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply(list).
So a groupby() operation can downcast to a Series, or if given a Series as input, can upcast to dataframe. For your first dataframe, you run unequal groupings (or unequal index lengths) coercing a series return which in the "combine" processing does not adequately yield a data frame.
Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.
preferably as a list of numpy arrays.
Preferably not, because you're asking for ragged arrays, which means that the inner arrays (AKA, the rows) are not all of the same length. This is inconvenient for numpy, meaning it cannot store these arrays efficiently as C arrays internally. It ends up falling back to slow python objects.
In this situation, I'd recommend nested python lists. That's achievable through a groupby
+ apply
.
lst = df.groupby('label')['data'].apply(pd.Series.tolist).tolist()
print(lst)
[[1.09, 5.0], [2.1, 2.0], [1.9]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With