groupby in pandas and exclude grouper column from output DataFrame

Question

I am trying to groupby a pandas df so that it keeps the key as index but it doesn't include the key in each group.

Here is an example of what I mean.

the original dataframe

ungrouped_df = pd.DataFrame({'col1':['A','A','B','C','C','C'], 'col2':[8,5,1,4,1,2], 'col3':[7,4,2,1,2,1],'col4':[1,8,0,2,0,0]})

out:

| index | col1 | col2 | col3 | col4 |
|-------|------|------|------|------|
| 1     |    A |    8 |    7 |    1 |
| 2     |    A |    5 |    4 |    8 |
| 3     |    B |    1 |    2 |    0 |
| 4     |    C |    4 |    1 |    2 |
| 5     |    C |    1 |    2 |    0 |
| 6     |    C |    2 |    1 |    0 |

now, I would like to create a numpy array from the grouped dataframe

grouped_df = ungrouped_df.groupby(by='col1', group_keys=False).apply(np.asarray)

This is what I get

| index | col1                                      | 
|-------|-------------------------------------------|
| A     | [[A, 8, 7, 1],[A, 5, 4, 8],[A, 8, 7, 1]]  |
| B     | [[B, 1, 2, 0]]                            |
| C     | [[C, 4, 1, 2], [C, 1, 2, 0], [C, 2, 1, 0]]|

This is what I'd like to get instead

out:

| index | col1                             | 
|-------|----------------------------------|
| A     | [[8, 7, 1],[5, 4, 8],[8, 7, 1]]  |
| B     | [[1, 2, 0]]                      |
| C     | [[4, 1, 2], [1, 2, 0], [2, 1, 0]]|

I can use some advice here because I am a bit lost. I thought that "group_keys=False" would do the trick but it doesn't. Any help is much appreciated.

Thanks

cs95 · Accepted Answer

I generally don't recommend storing lists in columns, but the most obvious way to fix this is to ensure the unwanted column is not being grouped on.

You can specify that either by

setting "col1" as the index before grouping, or
drop "col1" before grouping, or
selecting the columns you DO want to group

df.set_index('col1').groupby(level=0).apply(np.array)

col1
A               [[8, 7, 1], [5, 4, 8]]
B                          [[1, 2, 0]]
C    [[4, 1, 2], [1, 2, 0], [2, 1, 0]]

OR,

df.drop('col1', 1).groupby(df['col1']).apply(np.array)

col1
A               [[8, 7, 1], [5, 4, 8]]
B                          [[1, 2, 0]]
C    [[4, 1, 2], [1, 2, 0], [2, 1, 0]]

OR,

(df.groupby('col1')[df.columns.difference(['col1'])]
   .apply(lambda x: x.values.tolist()))

col1
A               [[8, 7, 1], [5, 4, 8]]
B                          [[1, 2, 0]]
C    [[4, 1, 2], [1, 2, 0], [2, 1, 0]]
dtype: object

groupby in pandas and exclude grouper column from output DataFrame

Tags:

python

pandas

pandas-groupby

LIB

1 Answers

cs95

Recent Activity

Donate For Us

groupby in pandas and exclude grouper column from output DataFrame

Tags:

python

pandas

pandas-groupby

LIB

1 Answers

cs95

Related questions

Recent Activity

Donate For Us