Can some please explain the difference between the asfreq and resample methods in pandas? When should one use what?
resample() function is primarily used for time series data. A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.
The asfreq() function is used to convert TimeSeries to specified frequency. Optionally provide filling method to pad/backfill missing values. Returns the original data conformed to a new index with the specified frequency.
The resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.
As previously mentioned, resample() is a method of pandas dataframes that can be used to summarize data by date or time. The . sum() method will add up all values for each resampling period (e.g. for each day) to provide a summary output value for that period.
resample
is more general than asfreq
. For example, using resample
I can pass an arbitrary function to perform binning over a Series
or DataFrame
object in bins of arbitrary size. asfreq
is a concise way of changing the frequency of a DatetimeIndex
object. It also provides padding functionality.
As the pandas documentation says, asfreq
is a thin wrapper around a call to date_range
+ a call to reindex
. See here for an example.
An example of resample
that I use in my daily work is computing the number of spikes of a neuron in 1 second bins by resampling a large boolean array where True
means "spike" and False
means "no spike". I can do that as easy as large_bool.resample('S', how='sum')
. Kind of neat!
asfreq
can be used when you want to change a DatetimeIndex
to have a different frequency while retaining the same values at the current index.
Here's an example where they are equivalent:
In [6]: dr = date_range('1/1/2010', periods=3, freq=3 * datetools.bday) In [7]: raw = randn(3) In [8]: ts = Series(raw, index=dr) In [9]: ts Out[9]: 2010-01-01 -1.948 2010-01-06 0.112 2010-01-11 -0.117 Freq: 3B, dtype: float64 In [10]: ts.asfreq(datetools.BDay()) Out[10]: 2010-01-01 -1.948 2010-01-04 NaN 2010-01-05 NaN 2010-01-06 0.112 2010-01-07 NaN 2010-01-08 NaN 2010-01-11 -0.117 Freq: B, dtype: float64 In [11]: ts.resample(datetools.BDay()) Out[11]: 2010-01-01 -1.948 2010-01-04 NaN 2010-01-05 NaN 2010-01-06 0.112 2010-01-07 NaN 2010-01-08 NaN 2010-01-11 -0.117 Freq: B, dtype: float64
As far as when to use either: it depends on the problem you have in mind...care to share?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With