Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

first/count applied to groupby returns empty dataframe

import pandas as pd

df = pd.DataFrame( {'A': [1,1,2,3,4,5,5,6,7,7,7,8]} )
dummy = df["A"]
print(dummy)

0     1
1     1
2     2
3     3
4     4
5     5
6     5
7     6
8     7
9     7
10    7
11    8
Name: A, dtype: int64

res = df.groupby(dummy)
print(res.first())

Empty DataFrame
Columns: []
Index: [1, 2, 3, 4, 5, 6, 7, 8]

Why the last print results in an empty dataframe? I except each group to be a slice of the original df, where each slice would contain as many rows as the number of duplicates for a given value in column "A". What am I missing?

like image 985
GaussD Avatar asked Mar 31 '26 17:03

GaussD


1 Answers

My guess is by default, A is set to index before applying the groupby operator (e.g. first). Therefore, df is essentially empty before the actual first operator is applied. If you have another column B:

df = pd.DataFrame( {'A': [1,1,2,3,4,5,5,6,7,7,7,8], 'B':range(12)} )

then you would see A as the index and the first values for B in each group with df.groupby(dummy).first():

    B
A    
1   0
2   2
3   3
4   4
5   5
6   7
7   8
8  11

On the other note, if you force as_index=False, groupby would not set A as index and you would have the non-empty data:

df.groupby(dummy, as_index=False).first()

gives:

   A
0  1
1  2
2  3
3  4
4  5
5  6
6  7
7  8

Or, you can groupby on a copy of the column:

df.groupby(dummy.copy()).first()

and you get:

   A
A   
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
like image 130
Quang Hoang Avatar answered Apr 02 '26 12:04

Quang Hoang