I'm working with timeseries data that represents vectors (magnitud and direction). I want to resample my data and use the describe
function as the how
parameter.
However, the describe
method uses a standard average and I want to use a special function to average direction. Because of this, I implemented my own describe
method based on the implementation of pandas.Series.describe()
:
def directionAverage(x):
result = np.arctan2(np.mean(np.sin(x)), np.mean(np.cos(x)))
if result < 0:
result += 2*np.pi
return result
def directionDescribe(x):
data = [directionAverage(x), x.std(), x.min(), x.quantile(0.25), x.median(), x.quantile(0.75), x.max()]
names = ['mean', 'std', 'min', '25%', '50%', '75%', 'max']
return Series(data, index=names)
The problem is that when I do:
df['direction'].resample('10Min', how=directionDescribe)
I get this exception (last few lines are shown):
File "C:\Python26\lib\site-packages\pandas\core\generic.py", line 234, in resample
return sampler.resample(self)
File "C:\Python26\lib\site-packages\pandas\tseries\resample.py", line 83, in resample
rs = self._resample_timestamps(obj)
File "C:\Python26\lib\site-packages\pandas\tseries\resample.py", line 217, in _resample_timestamps
result = grouped.aggregate(self._agg_method)
File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1626, in aggregate
result = self._aggregate_generic(arg, *args, **kwargs)
File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1681, in _aggregate_generic
return self._aggregate_item_by_item(func, *args, **kwargs)
File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1706, in _aggregate_item_by_item
result[item] = colg.aggregate(func, *args, **kwargs)
File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1357, in aggregate
result = self._aggregate_named(func_or_funcs, *args, **kwargs)
File "C:\Python26\lib\site-packages\pandas\core\groupby.py", line 1441, in _aggregate_named
raise Exception('Must produce aggregated value')
The question is: how do I implement my own describe
function so that it works with resample
?
Instead of resampling, you can groupby
where the group is a unit of time. To this group you can apply a function of your choice, for example your directionAverage function.
Note that I am importing the TimeGrouper function to allow grouping by time intervals.
import pandas as pd
import numpy as np
from pandas.tseries.resample import TimeGrouper
#group your data
new_data = df['direction'].groupby(TimeGrouper('10min'))
#apply your function to the grouped data
new_data.apply(directionDescribe)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With