Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert DatetimeIndexResampler to DataFrame?

Tags:

python

pandas

I want to build a matrix from series but before that I have to resample those series. However, to avoid processing the whole matrix twice with replace(np.nan, 0.0) I want to append the dataframes to a collecting dataframe and then remove NaN values in one pass.

So instead of

user_activities = user.groupby(["DOC_ACC_DT", "DOC_ACTV_CD"]).agg("sum")["SUM_DOC_CNT"].unstack().resample("1D").replace(np.nan, 0)
df = df.append(user_activities[activity].rename(user_id))

I want

user_activities = user.groupby(["DOC_ACC_DT", "DOC_ACTV_CD"]).agg("sum")["SUM_DOC_CNT"].unstack().resample("1D")
df = df.append(user_activities[activity].rename(user_id))

but that is not working because user_activities is not a dataframe after resample().

The error suggests that I try apply() but that method expects a parameter:

/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.pyc in _make_wrapper(self, name)
    507                    "using the 'apply' method".format(kind, name,
    508                                                      type(self).__name__))
--> 509             raise AttributeError(msg)
    510 
    511         # need to setup the selection

AttributeError: Cannot access callable attribute 'rename' of 'SeriesGroupBy' objects, try using the 'apply' method

How can I solve this issue?

like image 932
Stefan Falk Avatar asked Sep 14 '16 13:09

Stefan Falk


1 Answers

The interface to .resample has changed in Pandas 0.18.0 to be more groupby-like and hence more flexible ie resample no longer returns a DataFrame: it's now "lazyly evaluated" at the moment of the aggregation or interpolation.

I suggest reading resample API changes http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#resample-api

See also:

  • http://pandas.pydata.org/pandas-docs/stable/timeseries.html#resampling

  • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html

for upscaling

df.resample("1D").interpolate()

for downscaling

using mean

df.resample("1D").mean()

using OHLC

ie open high low close values or first maximal minimal last values

df.resample("1D").ohlc()
like image 164
scls Avatar answered Sep 19 '22 15:09

scls