first/count applied to groupby returns empty dataframe

Question

import pandas as pd

df = pd.DataFrame( {'A': [1,1,2,3,4,5,5,6,7,7,7,8]} )
dummy = df["A"]
print(dummy)

0     1
1     1
2     2
3     3
4     4
5     5
6     5
7     6
8     7
9     7
10    7
11    8
Name: A, dtype: int64

res = df.groupby(dummy)
print(res.first())

Empty DataFrame
Columns: []
Index: [1, 2, 3, 4, 5, 6, 7, 8]

Why the last print results in an empty dataframe? I except each group to be a slice of the original df, where each slice would contain as many rows as the number of duplicates for a given value in column "A". What am I missing?

Quang Hoang · Accepted Answer

My guess is by default, A is set to index before applying the groupby operator (e.g. first). Therefore, df is essentially empty before the actual first operator is applied. If you have another column B:

df = pd.DataFrame( {'A': [1,1,2,3,4,5,5,6,7,7,7,8], 'B':range(12)} )

then you would see A as the index and the first values for B in each group with df.groupby(dummy).first():

On the other note, if you force as_index=False, groupby would not set A as index and you would have the non-empty data:

df.groupby(dummy, as_index=False).first()

gives:

Or, you can groupby on a copy of the column:

df.groupby(dummy.copy()).first()

and you get:

first/count applied to groupby returns empty dataframe

Tags:

python

python-3.x

pandas

GaussD

1 Answers

Quang Hoang

Recent Activity

Donate For Us

first/count applied to groupby returns empty dataframe

Tags:

python

python-3.x

pandas

GaussD

1 Answers

Quang Hoang

Related questions

Recent Activity

Donate For Us