This post shows how to use first_valid_index
to find the first occurrence of a value in a dataframe column. How do I use first_valid_index along with a daily groupby in order to find first occurrence each day for the same example dataframe as shown in the linked post?
This is the groupby code I need to use:
grouper = pd.TimeGrouper("1d")
Edit:
When I use the lambda
and apply
approach it gives correct output. I am unable to send this output to a new column ['test_output']
though as it just shows NaT:
df['test_output'] = df.groupby(grouper)['test_1'].apply(lambda x: x.first_valid_index())
df
Out[9]:
test_1 test_output
2014-03-04 09:00:00 NaN NaT
2014-03-04 10:00:00 NaN NaT
2014-03-04 11:00:00 NaN NaT
2014-03-04 12:00:00 NaN NaT
2014-03-04 13:00:00 NaN NaT
2014-03-04 14:00:00 1.0 NaT
2014-03-04 15:00:00 1.0 NaT
2014-03-04 16:00:00 1.0 NaT
2014-03-05 09:00:00 1.0 NaT
IIUC you can use first
on your groupby
object:
In [95]:
df.groupby(grouper).first()
Out[95]:
test_1
2014-03-04 1.0
2014-03-05 1.0
should work, the above was generated using the same data as your linked question
EDIT
I think the above is actually correct as it's different from calling head(1)
for instance:
In [3]:
df.groupby(grouper).head(1)
Out[3]:
test_1 test_output
2014-03-04 09:00:00 NaN NaN
2014-03-05 09:00:00 1 1
but you can also call first_valid_index
using a lambda
with apply
:
In [6]:
df.groupby(grouper)['test_1'].apply(lambda x: x.first_valid_index())
Out[6]:
2014-03-04 2014-03-04 14:00:00
2014-03-05 2014-03-05 09:00:00
Name: test_1, dtype: datetime64[ns]
EDIT
To add this back as a column is a bit tricky, this is because you're trying to match the orig index against the new daily grouped groupby
object so it won't align which is why you NaT
. What you can do is call to_series
on the index, the reason we want this is so we can call map
, and access just the date
attribute. map
will perform a lookup so it will match on the date on the groupby
result and return the first valid date as desired:
In [136]:
df['first'] = df.index.to_series().dt.date.map(df.groupby(grouper)['test_1'].apply(lambda x: x.first_valid_index()))
df
Out[136]:
test_1 test_output first
2014-03-04 09:00:00 NaN NaN 2014-03-04 14:00:00
2014-03-04 10:00:00 NaN NaN 2014-03-04 14:00:00
2014-03-04 11:00:00 NaN NaN 2014-03-04 14:00:00
2014-03-04 12:00:00 NaN NaN 2014-03-04 14:00:00
2014-03-04 13:00:00 NaN NaN 2014-03-04 14:00:00
2014-03-04 14:00:00 1.0 1.0 2014-03-04 14:00:00
2014-03-04 15:00:00 1.0 1.0 2014-03-04 14:00:00
2014-03-04 16:00:00 1.0 1.0 2014-03-04 14:00:00
2014-03-05 09:00:00 1.0 1.0 2014-03-05 09:00:00
2014-03-05 10:00:00 1.0 1.0 2014-03-05 09:00:00
2014-03-05 11:00:00 1.0 1.0 2014-03-05 09:00:00
2014-03-05 12:00:00 1.0 1.0 2014-03-05 09:00:00
2014-03-05 13:00:00 1.0 1.0 2014-03-05 09:00:00
2014-03-05 14:00:00 1.0 1.0 2014-03-05 09:00:00
2014-03-05 15:00:00 1.0 1.0 2014-03-05 09:00:00
2014-03-05 16:00:00 1.0 1.0 2014-03-05 09:00:00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With