I have a times series with temperature and radiation in a pandas dataframe
. The time resolution is 1 minute in regular steps.
import datetime import pandas as pd import numpy as np date_times = pd.date_range(datetime.datetime(2012, 4, 5, 8, 0), datetime.datetime(2012, 4, 5, 12, 0), freq='1min') tamb = np.random.sample(date_times.size) * 10.0 radiation = np.random.sample(date_times.size) * 10.0 frame = pd.DataFrame(data={'tamb': tamb, 'radiation': radiation}, index=date_times) frame <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 241 entries, 2012-04-05 08:00:00 to 2012-04-05 12:00:00 Freq: T Data columns: radiation 241 non-null values tamb 241 non-null values dtypes: float64(2)
How can I down-sample this dataframe
to a resolution of one hour, computing the hourly mean for the temperature and the hourly sum for radiation?
The resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
Resampling generates a unique sampling distribution on the basis of the actual data. We can apply various frequency to resample our time series data. This is a very important technique in the field of analytics. There are many other types of time series frequency available.
First ensure that your dataframe has an index of type DateTimeIndex . Then use the resample function to either upsample (higher frequency) or downsample (lower frequency) your dataframe. Then apply an aggregator (e.g. sum ) to aggregate the values across the new sampling frequency.
Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .
With pandas 0.18 the resample API changed (see the docs). So for pandas >= 0.18 the answer is:
In [31]: frame.resample('1H').agg({'radiation': np.sum, 'tamb': np.mean}) Out[31]: tamb radiation 2012-04-05 08:00:00 5.161235 279.507182 2012-04-05 09:00:00 4.968145 290.941073 2012-04-05 10:00:00 4.478531 317.678285 2012-04-05 11:00:00 4.706206 335.258633 2012-04-05 12:00:00 2.457873 8.655838
Old Answer:
I am answering my question to reflect the time series related changes in pandas >= 0.8
(all other answers are outdated).
Using pandas >= 0.8 the answer is:
In [30]: frame.resample('1H', how={'radiation': np.sum, 'tamb': np.mean}) Out[30]: tamb radiation 2012-04-05 08:00:00 5.161235 279.507182 2012-04-05 09:00:00 4.968145 290.941073 2012-04-05 10:00:00 4.478531 317.678285 2012-04-05 11:00:00 4.706206 335.258633 2012-04-05 12:00:00 2.457873 8.655838
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With