Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No numeric types to aggregate - change in groupby() behaviour?

Tags:

python

pandas

I have a problem with some groupy code which I'm quite sure once ran (on an older pandas version). On 0.9, I get No numeric types to aggregate errors. Any ideas?

In [31]: data Out[31]:  <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00 Freq: <1 DateOffset> Columns: 360 entries, -89.75 to 89.75 dtypes: object(360)  In [32]: latedges = linspace(-90., 90., 73)  In [33]: lats_new = linspace(-87.5, 87.5, 72)  In [34]: def _get_gridbox_label(x, bins, labels):    ....:             return labels[searchsorted(bins, x) - 1]    ....:   In [35]: lat_bucket = lambda x: _get_gridbox_label(x, latedges, lats_new)  In [36]: data.T.groupby(lat_bucket).mean() --------------------------------------------------------------------------- DataError                                 Traceback (most recent call last) <ipython-input-36-ed9c538ac526> in <module>() ----> 1 data.T.groupby(lat_bucket).mean()  /usr/lib/python2.7/site-packages/pandas/core/groupby.py in mean(self)     295         """     296         try: --> 297             return self._cython_agg_general('mean')     298         except DataError:     299             raise  /usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)    1415     1416     def _cython_agg_general(self, how, numeric_only=True): -> 1417         new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)    1418         return self._wrap_agged_blocks(new_blocks)    1419   /usr/lib/python2.7/site-packages/pandas/core/groupby.py in _cython_agg_blocks(self, how, numeric_only)    1455     1456         if len(new_blocks) == 0: -> 1457             raise DataError('No numeric types to aggregate')    1458     1459         return new_blocks  DataError: No numeric types to aggregate 
like image 775
andreas-h Avatar asked Oct 11 '12 16:10

andreas-h


2 Answers

How are you generating your data?

See how the output shows that your data is of 'object' type? the groupby operations specifically check whether each column is a numeric dtype first.

In [31]: data Out[31]:  <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2557 entries, 2004-01-01 00:00:00 to 2010-12-31 00:00:00 Freq: <1 DateOffset> Columns: 360 entries, -89.75 to 89.75 dtypes: object(360) 

look ↑


Did you initialize an empty DataFrame first and then filled it? If so that's probably why it changed with the new version as before 0.9 empty DataFrames were initialized to float type but now they are of object type. If so you can change the initialization to DataFrame(dtype=float).

You can also call frame.astype(float)

like image 163
Chang She Avatar answered Sep 22 '22 23:09

Chang She


I got this error generating a data frame consisting of timestamps and data:

df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp)) 

Adding the suggested solution works for me:

df = pd.DataFrame({'data':value}, index=pd.DatetimeIndex(timestamp), dtype=float)) 

Thanks Chang She!

Example:

                     data 2005-01-01 00:10:00  7.53 2005-01-01 00:20:00  7.54 2005-01-01 00:30:00  7.62 2005-01-01 00:40:00  7.68 2005-01-01 00:50:00  7.81 2005-01-01 01:00:00  7.95 2005-01-01 01:10:00  7.96 2005-01-01 01:20:00  7.95 2005-01-01 01:30:00  7.98 2005-01-01 01:40:00  8.06 2005-01-01 01:50:00  8.04 2005-01-01 02:00:00  8.06 2005-01-01 02:10:00  8.12 2005-01-01 02:20:00  8.12 2005-01-01 02:30:00  8.25 2005-01-01 02:40:00  8.27 2005-01-01 02:50:00  8.17 2005-01-01 03:00:00  8.21 2005-01-01 03:10:00  8.29 2005-01-01 03:20:00  8.31 2005-01-01 03:30:00  8.25 2005-01-01 03:40:00  8.19 2005-01-01 03:50:00  8.17 2005-01-01 04:00:00  8.18                      data 2005-01-01 00:00:00  7.636000 2005-01-01 01:00:00  7.990000 2005-01-01 02:00:00  8.165000 2005-01-01 03:00:00  8.236667 2005-01-01 04:00:00  8.180000 
like image 30
Blackbrook Avatar answered Sep 19 '22 23:09

Blackbrook