import pandas as pd
df = pd.DataFrame( {'A': [1,1,2,3,4,5,5,6,7,7,7,8]} )
dummy = df["A"]
print(dummy)
0 1
1 1
2 2
3 3
4 4
5 5
6 5
7 6
8 7
9 7
10 7
11 8
Name: A, dtype: int64
res = df.groupby(dummy)
print(res.first())
Empty DataFrame
Columns: []
Index: [1, 2, 3, 4, 5, 6, 7, 8]
Why the last print results in an empty dataframe? I except each group to be a slice of the original df, where each slice would contain as many rows as the number of duplicates for a given value in column "A". What am I missing?
My guess is by default, A is set to index before applying the groupby operator (e.g. first). Therefore, df is essentially empty before the actual first operator is applied. If you have another column B:
df = pd.DataFrame( {'A': [1,1,2,3,4,5,5,6,7,7,7,8], 'B':range(12)} )
then you would see A as the index and the first values for B in each group with df.groupby(dummy).first():
B
A
1 0
2 2
3 3
4 4
5 5
6 7
7 8
8 11
On the other note, if you force as_index=False, groupby would not set A as index and you would have the non-empty data:
df.groupby(dummy, as_index=False).first()
gives:
A
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
Or, you can groupby on a copy of the column:
df.groupby(dummy.copy()).first()
and you get:
A
A
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With