I am trying to groupby a pandas df so that it keeps the key as index but it doesn't include the key in each group.
Here is an example of what I mean.
the original dataframe
ungrouped_df = pd.DataFrame({'col1':['A','A','B','C','C','C'], 'col2':[8,5,1,4,1,2], 'col3':[7,4,2,1,2,1],'col4':[1,8,0,2,0,0]})
out:
| index | col1 | col2 | col3 | col4 |
|-------|------|------|------|------|
| 1 | A | 8 | 7 | 1 |
| 2 | A | 5 | 4 | 8 |
| 3 | B | 1 | 2 | 0 |
| 4 | C | 4 | 1 | 2 |
| 5 | C | 1 | 2 | 0 |
| 6 | C | 2 | 1 | 0 |
now, I would like to create a numpy array from the grouped dataframe
grouped_df = ungrouped_df.groupby(by='col1', group_keys=False).apply(np.asarray)
This is what I get
| index | col1 |
|-------|-------------------------------------------|
| A | [[A, 8, 7, 1],[A, 5, 4, 8],[A, 8, 7, 1]] |
| B | [[B, 1, 2, 0]] |
| C | [[C, 4, 1, 2], [C, 1, 2, 0], [C, 2, 1, 0]]|
out:
| index | col1 |
|-------|----------------------------------|
| A | [[8, 7, 1],[5, 4, 8],[8, 7, 1]] |
| B | [[1, 2, 0]] |
| C | [[4, 1, 2], [1, 2, 0], [2, 1, 0]]|
I can use some advice here because I am a bit lost. I thought that "group_keys=False" would do the trick but it doesn't. Any help is much appreciated.
Thanks
I generally don't recommend storing lists in columns, but the most obvious way to fix this is to ensure the unwanted column is not being grouped on.
You can specify that either by
df.set_index('col1').groupby(level=0).apply(np.array)
col1
A [[8, 7, 1], [5, 4, 8]]
B [[1, 2, 0]]
C [[4, 1, 2], [1, 2, 0], [2, 1, 0]]
OR,
df.drop('col1', 1).groupby(df['col1']).apply(np.array)
col1
A [[8, 7, 1], [5, 4, 8]]
B [[1, 2, 0]]
C [[4, 1, 2], [1, 2, 0], [2, 1, 0]]
OR,
(df.groupby('col1')[df.columns.difference(['col1'])]
.apply(lambda x: x.values.tolist()))
col1
A [[8, 7, 1], [5, 4, 8]]
B [[1, 2, 0]]
C [[4, 1, 2], [1, 2, 0], [2, 1, 0]]
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With