Starting with something like this:
from pandas import DataFrame
time = np.array(('2015-08-01T00:00:00','2015-08-01T12:00:00'),dtype='datetime64[ns]')
heat_index = np.array([101,103])
air_temperature = np.array([96,95])
df = DataFrame({'heat_index':heat_index,'air_temperature':air_temperature},index=time)
yielding this for df
:
air_temperature heat_index
2015-08-01 07:00:00 96 101
2015-08-01 19:00:00 95 103
then resample daily:
df_daily = df.resample('24H',how='max')
To get this for df_daily
:
air_temperature heat_index
2015-08-01 96 103
So by resampling using how='max'
pandas resamples each 24 hour period, taking the maximum value within that period from each column.
But as you can see looking at df
output for 2015-08-01
, that day's maximum heat index (which occurs at 19:00:00
) does not correlate with air temperature occurred at the same time. That is, the heat index of 103F was caused with an air temperature of 95F. This association is lost through resampling, and we end up looking at the air temperature from a different part of the day.
Is there a way to resample just one column, and preserve the value in another column at the same index? So that the final outcome would look like this:
air_temperature heat_index
2015-08-01 95 103
My first guess is to just resample the heat_index
column...
df_daily = df.resample('24H',how={'heat_index':'max'})
to get...
air_temperature
2015-08-01 103
...and then trying to do some sort of DataFrame.loc or DataFrame.ix from there, but have been unsuccessful. Any thoughts on how to find the related value after resampling (e.g. to find the air_temperature
that occurred at the same time as what is later found to be the maximum heat_index
)?
The resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
resample is more general than asfreq . For example, using resample I can pass an arbitrary function to perform binning over a Series or DataFrame object in bins of arbitrary size. asfreq is a concise way of changing the frequency of a DatetimeIndex object. It also provides padding functionality.
First ensure that your dataframe has an index of type DateTimeIndex . Then use the resample function to either upsample (higher frequency) or downsample (lower frequency) your dataframe. Then apply an aggregator (e.g. sum ) to aggregate the values across the new sampling frequency.
Here's one way - the .groupby(TimeGrouper())
is essentially what resample
is doing, then the aggregation function filters each group to the max observation.
In [60]: (df.groupby(pd.TimeGrouper('24H'))
.agg(lambda df: df.loc[df['heat_index'].idxmax(), :]))
Out[60]:
air_temperature heat_index
2015-08-01 95 103
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With