Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to resample a dataframe with different functions applied to each column?

I have a times series with temperature and radiation in a pandas dataframe. The time resolution is 1 minute in regular steps.

import datetime import pandas as pd import numpy as np  date_times = pd.date_range(datetime.datetime(2012, 4, 5, 8, 0),                            datetime.datetime(2012, 4, 5, 12, 0),                            freq='1min') tamb = np.random.sample(date_times.size) * 10.0 radiation = np.random.sample(date_times.size) * 10.0 frame = pd.DataFrame(data={'tamb': tamb, 'radiation': radiation},                      index=date_times) frame <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 241 entries, 2012-04-05 08:00:00 to 2012-04-05 12:00:00 Freq: T Data columns: radiation    241  non-null values tamb         241  non-null values dtypes: float64(2) 

How can I down-sample this dataframe to a resolution of one hour, computing the hourly mean for the temperature and the hourly sum for radiation?

like image 651
bmu Avatar asked Apr 04 '12 23:04

bmu


People also ask

What is resample (' MS ') in Python?

The resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

What does PD resample do?

Resampling generates a unique sampling distribution on the basis of the actual data. We can apply various frequency to resample our time series data. This is a very important technique in the field of analytics. There are many other types of time series frequency available.

How do you Upsample in Pandas?

First ensure that your dataframe has an index of type DateTimeIndex . Then use the resample function to either upsample (higher frequency) or downsample (lower frequency) your dataframe. Then apply an aggregator (e.g. sum ) to aggregate the values across the new sampling frequency.

How do you resample data in Python?

Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .


1 Answers

With pandas 0.18 the resample API changed (see the docs). So for pandas >= 0.18 the answer is:

In [31]: frame.resample('1H').agg({'radiation': np.sum, 'tamb': np.mean}) Out[31]:                           tamb   radiation 2012-04-05 08:00:00  5.161235  279.507182 2012-04-05 09:00:00  4.968145  290.941073 2012-04-05 10:00:00  4.478531  317.678285 2012-04-05 11:00:00  4.706206  335.258633 2012-04-05 12:00:00  2.457873    8.655838 

Old Answer:

I am answering my question to reflect the time series related changes in pandas >= 0.8 (all other answers are outdated).

Using pandas >= 0.8 the answer is:

In [30]: frame.resample('1H', how={'radiation': np.sum, 'tamb': np.mean}) Out[30]:                           tamb   radiation 2012-04-05 08:00:00  5.161235  279.507182 2012-04-05 09:00:00  4.968145  290.941073 2012-04-05 10:00:00  4.478531  317.678285 2012-04-05 11:00:00  4.706206  335.258633 2012-04-05 12:00:00  2.457873    8.655838 
like image 100
bmu Avatar answered Sep 28 '22 13:09

bmu