I'm working with the following DataFrame:
Date Id Amount
0 201301 1 nan
1 201302 1 nan
2 201303 1 100
3 201304 1 120
4 201305 1 nan
5 201306 1 120
6 201302 2 nan
7 201303 2 150
8 201304 2 180
I'm trying to get the first valid index of Amount by Id. Because of some reason this doesn't work:
df.groupby('Id').Amount.first_valid_index()
I'm also trying this:
df.groupby('Id').Amount.apply(lambda x: x.first_valid_index())
But my dataset is 20M+ rows, so it's taking too long and that won't work for me.
Is there any faster way to find the first index by group?
My desired output would be:
first_idx = [2,7]
Or even better:
Date Id Amount
2 201303 1 100
3 201304 1 120
4 201305 1 nan
5 201306 1 120
7 201303 2 150
8 201304 2 180
Edit: df.groupby('Id').Amount.apply(lambda x: x.first_valid_index()) indeed works, but I have the feeling there has to be a faster option, the problem doesn't seem to be that complex.
Option 1: To get just the first indexes:
df[df.Amount.notna()].groupby('Id').Date.idxmin()
# 1.42 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
outputs:
Id
1 2
2 7
Name: Date, dtype: int64
Option 2: to get the other rows, use cumsum on notna()
df[df['Amount'].notna().groupby(df['Id']).cumsum().gt(0)]
# 2.09 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Option 3: you can ffill() within group and choose those are not filled:
df[df.groupby('Id').Amount.ffill().notna()]
# 831 µs ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Output:
Date Id Amount
2 201303 1 100.0
3 201304 1 120.0
4 201305 1 NaN
5 201306 1 120.0
7 201303 2 150.0
8 201304 2 180.0
Conclusion: Option 3 is the fastest!
Update: to filter both ends using Option 3:
amt_group = df.groupby('Id').Amount
df[amt_group.bfill().notna() & amt_group.ffill().notna()]
Create a mask with .notnull + .cumsum to get everything after the first non-null Amount within the group. Then make a slice.
m = df.Amount.notnull().groupby(df.Id).cumsum().ge(1)
df.loc[m]
Date Id Amount
2 201303 1 100.0
3 201304 1 120.0
4 201305 1 NaN
5 201306 1 120.0
7 201303 2 150.0
8 201304 2 180.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With