Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

groupby in pandas and exclude grouper column from output DataFrame

I am trying to groupby a pandas df so that it keeps the key as index but it doesn't include the key in each group.

Here is an example of what I mean.

  1. the original dataframe

    ungrouped_df = pd.DataFrame({'col1':['A','A','B','C','C','C'], 'col2':[8,5,1,4,1,2], 'col3':[7,4,2,1,2,1],'col4':[1,8,0,2,0,0]})

out:

| index | col1 | col2 | col3 | col4 |
|-------|------|------|------|------|
| 1     |    A |    8 |    7 |    1 |
| 2     |    A |    5 |    4 |    8 |
| 3     |    B |    1 |    2 |    0 |
| 4     |    C |    4 |    1 |    2 |
| 5     |    C |    1 |    2 |    0 |
| 6     |    C |    2 |    1 |    0 |
  1. now, I would like to create a numpy array from the grouped dataframe

    grouped_df = ungrouped_df.groupby(by='col1', group_keys=False).apply(np.asarray)

This is what I get

| index | col1                                      | 
|-------|-------------------------------------------|
| A     | [[A, 8, 7, 1],[A, 5, 4, 8],[A, 8, 7, 1]]  |
| B     | [[B, 1, 2, 0]]                            |
| C     | [[C, 4, 1, 2], [C, 1, 2, 0], [C, 2, 1, 0]]|
  1. This is what I'd like to get instead

out:

| index | col1                             | 
|-------|----------------------------------|
| A     | [[8, 7, 1],[5, 4, 8],[8, 7, 1]]  |
| B     | [[1, 2, 0]]                      |
| C     | [[4, 1, 2], [1, 2, 0], [2, 1, 0]]|

I can use some advice here because I am a bit lost. I thought that "group_keys=False" would do the trick but it doesn't. Any help is much appreciated.

Thanks

like image 455
LIB Avatar asked Mar 01 '23 19:03

LIB


1 Answers

I generally don't recommend storing lists in columns, but the most obvious way to fix this is to ensure the unwanted column is not being grouped on.

You can specify that either by

  1. setting "col1" as the index before grouping, or
  2. drop "col1" before grouping, or
  3. selecting the columns you DO want to group

df.set_index('col1').groupby(level=0).apply(np.array)

col1
A               [[8, 7, 1], [5, 4, 8]]
B                          [[1, 2, 0]]
C    [[4, 1, 2], [1, 2, 0], [2, 1, 0]]

OR,

df.drop('col1', 1).groupby(df['col1']).apply(np.array)

col1
A               [[8, 7, 1], [5, 4, 8]]
B                          [[1, 2, 0]]
C    [[4, 1, 2], [1, 2, 0], [2, 1, 0]]

OR,

(df.groupby('col1')[df.columns.difference(['col1'])]
   .apply(lambda x: x.values.tolist()))

col1
A               [[8, 7, 1], [5, 4, 8]]
B                          [[1, 2, 0]]
C    [[4, 1, 2], [1, 2, 0], [2, 1, 0]]
dtype: object
like image 151
cs95 Avatar answered Mar 08 '23 23:03

cs95