I'm dealing with time series and try to write function to calculation monthly average of data. Here are some function for prepare:
import datetime
import numpy as numpy
def date_range_0(start,end):
dates = [start + datetime.timedelta(days=i)
for i in range((end-start).days+1)]
return numpy.array(dates)
def date_range_1(start,days):
#days should be an interger
return date_range_0(start,start+datetime.timedelta(days-1))
x=date_range_1(datetime.datetime(2015, 5, 17),4)
x, the output is a simple time list:
array([datetime.datetime(2015, 5, 17, 0, 0),
datetime.datetime(2015, 5, 18, 0, 0),
datetime.datetime(2015, 5, 19, 0, 0),
datetime.datetime(2015, 5, 20, 0, 0)], dtype=object)
Then I learn groupby function from http://blog.csdn.net/youngbit007/article/details/54288603 I have tried one example in website above and it works fine:
df = pandas.DataFrame({'key1':date_range_1(datetime.datetime(2015, 1, 17),5),
'key2': [2015001,2015001,2015001,2015001,2015001],
'data1': 1+0.1*numpy.arange(1,6)
})
df
gives
data1 key1 key2
0 1.1 2015-01-17 2015001
1 1.2 2015-01-18 2015001
2 1.3 2015-01-19 2015001
3 1.4 2015-01-20 2015001
4 1.5 2015-01-21 2015001
and
grouped=df['data1'].groupby(df['key2'])
grouped.mean()
gives
key2
2015001 0.2
Name: data1, dtype: float64
Then I try my own example:
datedat=numpy.array([date_range_1(datetime.datetime(2015, 1, 17),5),1+0.1*numpy.arange(1,6)]).T
months = [day.month for day in datedat[:,0]]
years = [day.year for day in datedat[:,0]]
datedatF =
pandas.DataFrame({'key1':datedat[:,0],'key2':list((numpy.array(years)*1000 +numpy.array(months))),'data1':datedat[:,1]})
datedatF
which generated
data1 key1 key2
0 1.1 2015-01-17 2015001
1 1.2 2015-01-18 2015001
2 1.3 2015-01-19 2015001
3 1.4 2015-01-20 2015001
4 1.5 2015-01-21 2015001
Note this is exactly the very same table as above! so far so good. Then I run:
grouped2=datedatF['data1'].groupby(datedatF['key2'])
grouped2.mean()
it throw out this:
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-170-f0d2bc225b88> in <module>()
1 grouped2=datedatF['data1'].groupby(datedatF['key2'])
----> 2 grouped2.mean()
/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in mean(self, *args, **kwargs)
1017 nv.validate_groupby_func('mean', args, kwargs)
1018 try:
-> 1019 return self._cython_agg_general('mean')
1020 except GroupByError:
1021 raise
/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
806
807 if len(output) == 0:
--> 808 raise DataError('No numeric types to aggregate')
809
810 return self._wrap_aggregated_output(output, names)
DataError: No numeric types to aggregate
ohh..what did I wrong?Why can't I mean the second pandas.DataFrame? It's completely same as the successful example!
agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation.
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
An aggregated function returns a single aggregated value for each group. Once the group by object is created, several aggregation operations can be performed on the grouped data.
You data1 type in your df is object , we need adding pd.to_numeric
datedatF.dtypes
Out[39]:
data1 object
key1 datetime64[ns]
key2 int64
dtype: object
grouped2=pd.to_numeric(datedatF['data1']).groupby(datedatF['key2'])
grouped2.mean()
Out[41]:
key2
2015001 1.3
Name: data1, dtype: float64
your data1
is of object
(string) dtype:
In [396]: datedatF.dtypes
Out[396]:
data1 object # <--- NOTE!
key1 datetime64[ns]
key2 int64
dtype: object
so try this:
In [397]: datedatF.assign(data1=pd.to_numeric(datedatF['data1'], errors='coerce')) \
.groupby('key2')['data1'].mean()
Out[397]:
key2
2015001 1.3
Name: data1, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With