Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"No numeric types to aggregate" after groupby and mean

Tags:

python

pandas

I'm dealing with time series and try to write function to calculation monthly average of data. Here are some function for prepare:

import datetime
import numpy as numpy
def date_range_0(start,end):

    dates = [start + datetime.timedelta(days=i) 
            for i in range((end-start).days+1)]
    return numpy.array(dates)
def date_range_1(start,days):
    #days should be an interger

    return date_range_0(start,start+datetime.timedelta(days-1))

x=date_range_1(datetime.datetime(2015, 5, 17),4)

x, the output is a simple time list:

array([datetime.datetime(2015, 5, 17, 0, 0),
   datetime.datetime(2015, 5, 18, 0, 0),
   datetime.datetime(2015, 5, 19, 0, 0),
   datetime.datetime(2015, 5, 20, 0, 0)], dtype=object)

Then I learn groupby function from http://blog.csdn.net/youngbit007/article/details/54288603 I have tried one example in website above and it works fine:

df = pandas.DataFrame({'key1':date_range_1(datetime.datetime(2015, 1, 17),5),
              'key2': [2015001,2015001,2015001,2015001,2015001],
              'data1': 1+0.1*numpy.arange(1,6)
        })
df

gives

   data1    key1    key2
0   1.1 2015-01-17  2015001
1   1.2 2015-01-18  2015001
2   1.3 2015-01-19  2015001
3   1.4 2015-01-20  2015001
4   1.5 2015-01-21  2015001

and

grouped=df['data1'].groupby(df['key2'])
grouped.mean()

gives

key2
2015001    0.2
Name: data1, dtype: float64

Then I try my own example:

datedat=numpy.array([date_range_1(datetime.datetime(2015, 1, 17),5),1+0.1*numpy.arange(1,6)]).T
months = [day.month for day in datedat[:,0]]
years = [day.year for day in datedat[:,0]]
datedatF = 
pandas.DataFrame({'key1':datedat[:,0],'key2':list((numpy.array(years)*1000 +numpy.array(months))),'data1':datedat[:,1]})
datedatF

which generated

   data1    key1    key2
0   1.1 2015-01-17  2015001
1   1.2 2015-01-18  2015001
2   1.3 2015-01-19  2015001
3   1.4 2015-01-20  2015001
4   1.5 2015-01-21  2015001

Note this is exactly the very same table as above! so far so good. Then I run:

grouped2=datedatF['data1'].groupby(datedatF['key2'])
grouped2.mean()

it throw out this:

   ---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-170-f0d2bc225b88> in <module>()
  1 grouped2=datedatF['data1'].groupby(datedatF['key2'])
----> 2 grouped2.mean()

/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in     mean(self, *args, **kwargs)
   1017         nv.validate_groupby_func('mean', args, kwargs)
   1018         try:
-> 1019             return self._cython_agg_general('mean')
   1020         except GroupByError:
   1021             raise

/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in     _cython_agg_general(self, how, numeric_only)
    806 
    807         if len(output) == 0:
--> 808             raise DataError('No numeric types to aggregate')
    809 
    810         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

ohh..what did I wrong?Why can't I mean the second pandas.DataFrame? It's completely same as the successful example!

like image 962
Harry Avatar asked Jan 09 '18 15:01

Harry


People also ask

What is AGG in GroupBy?

agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation.

What is possible using GroupBy () method of pandas?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.

What does GroupBy mean and what is it used for?

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

What does GroupBy in pandas return?

An aggregated function returns a single aggregated value for each group. Once the group by object is created, several aggregation operations can be performed on the grouped data.


2 Answers

You data1 type in your df is object , we need adding pd.to_numeric

datedatF.dtypes
Out[39]: 
data1            object
key1     datetime64[ns]
key2              int64
dtype: object
grouped2=pd.to_numeric(datedatF['data1']).groupby(datedatF['key2'])
grouped2.mean()
Out[41]: 
key2
2015001    1.3
Name: data1, dtype: float64
like image 155
BENY Avatar answered Oct 02 '22 05:10

BENY


your data1 is of object (string) dtype:

In [396]: datedatF.dtypes
Out[396]:
data1            object   # <--- NOTE!
key1     datetime64[ns]
key2              int64
dtype: object

so try this:

In [397]: datedatF.assign(data1=pd.to_numeric(datedatF['data1'], errors='coerce')) \
                  .groupby('key2')['data1'].mean()
Out[397]:
key2
2015001    1.3
Name: data1, dtype: float64
like image 20
MaxU - stop WAR against UA Avatar answered Oct 02 '22 05:10

MaxU - stop WAR against UA