I am trying to resample a pandas data frame with a timestamp index to an hourly occurrence. I am interested in obtaining the most frequent value for a column with string values . However the built in functions of time series resampling do not include mode as one of the default methods to resample (as it does 'mean' and 'count').
I tried to define my own function and to pass that function but is not working. I've also tried using the np.bincount
function but it does not work since I am working with strings.
Here is how my data looks:
station_arrived action lat1 lon1
date_removed
2012-01-01 13:12:00 56 A 19.4171 -99.16561
2012-01-01 13:12:00 56 A 19.4271 -99.16361
2012-01-01 15:41:00 56 A 19.4171 -99.16561
2012-01-02 08:41:00 56 C 19.4271 -99.16561
2012-01-02 11:36:00 56 C 19.2171 -99.16561
This is my code so far:
def mode1(algo):
common=[ite for ite, it in Counter(algo).most_common(1)]
# Returns all unique items and their counts
return common
hourlycount2 = travels2012.resample('H', how={'station_arrived': 'count',
'action': mode(travels2012['action']),
'lat1':'count', 'lon1':'count'})
hourlycount2.head()
I see the following error:
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py", line 2836, in resample
return sampler.resample(self).__finalize__(self)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\tseries\resample.py", line 83, in resample
rs = self._resample_timestamps()
File "C:\Program Files\Anaconda\lib\site-packages\pandas\tseries\resample.py", line 277, in _resample_timestamps
result = grouped.aggregate(self._agg_method)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2404, in aggregate
result[col] = colg.aggregate(agg_how)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2076, in aggregate
ret = self._aggregate_multiple_funcs(func_or_funcs)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2125, in _aggregate_multiple_funcs
results[name] = self.aggregate(func)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2073, in aggregate
return getattr(self, func_or_funcs)(*args, **kwargs)
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\groupby.py", line 486, in __getattr__
(type(self).__name__, attr))
AttributeError: 'SeriesGroupBy' object has no attribute 'A '
The values in the dict have to be either strings representing functions (e.g. 'count'/'sum'/'max') or functions which are passed to each group. What you are passing is the result (the value) mode(travels2012['action'])
.
So you need to make this a function, which is applied to each group:
In [11]: df.resample('H', how={'station_arrived':'count',
'action': lambda x: mode(df['action']),
'lat1':'count', 'lon1':'count'})
Out[11]:
action station_arrived lon1 lat1
date_removed
2012-01-01 13:00:00 [A] 2 2 2
2012-01-01 14:00:00 [A] 0 0 0
2012-01-01 15:00:00 [A] 1 1 1
2012-01-01 16:00:00 [A] 0 0 0
...
I'm not sure that this is what you want (as it is applying to the entire column), perhaps you want to take the mode for each group:
In [12]: df.resample('H', how={'station_arrived':'count',
'action': mode, 'lat1':'count', 'lon1':'count'})
Out[12]:
action station_arrived lon1 lat1
date_removed
2012-01-01 13:00:00 [A] 2 2 2
2012-01-01 14:00:00 [] 0 0 0
2012-01-01 15:00:00 [A] 1 1 1
2012-01-01 16:00:00 [] 0 0 0
...
I would prefer to see the actual value (A) rather than it in a list, and NaN rather than [].
I think it's worth mentioning the Series mode method, which has the caveat that it always returns a Series (as there may be a draw) and is empty if no value appears more than once.
You could wrap around it as follows (and you can similarly wrap your mode function):
def mode_(s):
try:
return s.mode()[0]
except IndexError:
return np.nan
In [22]: df.resample('H', how={'station_arrived':'count',
'action': mode_, 'lat1':'count', 'lon1':'count'})
Out[22]:
action station_arrived lon1 lat1
date_removed
2012-01-01 13:00:00 A 2 2 2
2012-01-01 14:00:00 NaN 0 0 0
2012-01-01 15:00:00 NaN 1 1 1
2012-01-01 16:00:00 NaN 0 0 0
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With