Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using groupby ("1d") and first_valid_index together

Tags:

python

pandas

This post shows how to use first_valid_index to find the first occurrence of a value in a dataframe column. How do I use first_valid_index along with a daily groupby in order to find first occurrence each day for the same example dataframe as shown in the linked post?

This is the groupby code I need to use:

grouper = pd.TimeGrouper("1d")

Edit:

When I use the lambda and apply approach it gives correct output. I am unable to send this output to a new column ['test_output'] though as it just shows NaT:

df['test_output'] = df.groupby(grouper)['test_1'].apply(lambda x: x.first_valid_index())

df
Out[9]:
test_1  test_output
2014-03-04 09:00:00 NaN NaT
2014-03-04 10:00:00 NaN NaT
2014-03-04 11:00:00 NaN NaT
2014-03-04 12:00:00 NaN NaT
2014-03-04 13:00:00 NaN NaT
2014-03-04 14:00:00 1.0 NaT
2014-03-04 15:00:00 1.0 NaT
2014-03-04 16:00:00 1.0 NaT
2014-03-05 09:00:00 1.0 NaT
like image 272
nipy Avatar asked Jul 04 '16 14:07

nipy


1 Answers

IIUC you can use first on your groupby object:

In [95]:
df.groupby(grouper).first()

Out[95]:
            test_1
2014-03-04     1.0
2014-03-05     1.0

should work, the above was generated using the same data as your linked question

EDIT

I think the above is actually correct as it's different from calling head(1) for instance:

In [3]:
df.groupby(grouper).head(1)

Out[3]:
                     test_1  test_output
2014-03-04 09:00:00     NaN          NaN
2014-03-05 09:00:00       1            1

but you can also call first_valid_index using a lambda with apply:

In [6]:
df.groupby(grouper)['test_1'].apply(lambda x: x.first_valid_index())

Out[6]:
2014-03-04   2014-03-04 14:00:00
2014-03-05   2014-03-05 09:00:00
Name: test_1, dtype: datetime64[ns]

EDIT

To add this back as a column is a bit tricky, this is because you're trying to match the orig index against the new daily grouped groupby object so it won't align which is why you NaT. What you can do is call to_series on the index, the reason we want this is so we can call map, and access just the date attribute. map will perform a lookup so it will match on the date on the groupby result and return the first valid date as desired:

In [136]:
df['first'] = df.index.to_series().dt.date.map(df.groupby(grouper)['test_1'].apply(lambda x: x.first_valid_index()))
df

Out[136]:
                     test_1  test_output               first
2014-03-04 09:00:00     NaN          NaN 2014-03-04 14:00:00
2014-03-04 10:00:00     NaN          NaN 2014-03-04 14:00:00
2014-03-04 11:00:00     NaN          NaN 2014-03-04 14:00:00
2014-03-04 12:00:00     NaN          NaN 2014-03-04 14:00:00
2014-03-04 13:00:00     NaN          NaN 2014-03-04 14:00:00
2014-03-04 14:00:00     1.0          1.0 2014-03-04 14:00:00
2014-03-04 15:00:00     1.0          1.0 2014-03-04 14:00:00
2014-03-04 16:00:00     1.0          1.0 2014-03-04 14:00:00
2014-03-05 09:00:00     1.0          1.0 2014-03-05 09:00:00
2014-03-05 10:00:00     1.0          1.0 2014-03-05 09:00:00
2014-03-05 11:00:00     1.0          1.0 2014-03-05 09:00:00
2014-03-05 12:00:00     1.0          1.0 2014-03-05 09:00:00
2014-03-05 13:00:00     1.0          1.0 2014-03-05 09:00:00
2014-03-05 14:00:00     1.0          1.0 2014-03-05 09:00:00
2014-03-05 15:00:00     1.0          1.0 2014-03-05 09:00:00
2014-03-05 16:00:00     1.0          1.0 2014-03-05 09:00:00
like image 54
EdChum Avatar answered Oct 11 '22 02:10

EdChum