Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DatetimeIndex TypeError

I attempting to do what was done here: Pandas resampling with custom volume weighted aggregation but am hitting a TypeError with my Index.

I have data like:

                         Dates       P   Q
0   2020-09-07 01:20:24.738686  7175.0  21
1   2020-09-07 01:45:27.540590  7150.0   7
2   2020-09-07 03:48:49.120607  7125.0   4
3   2020-09-07 04:45:50.972042  7125.0   6
4   2020-09-07 05:36:23.139612  7125.0   2

I check the type using print(df.dtypes) which returns:

Dates    datetime64[ns]
P               float64
Q                 int64
dtype: object

I then set the index to be the dates using df = df.set_index(pd.DatetimeIndex(df['Dates']))

And then I drop the Dates column to make it easier to read using df = df.drop(['Dates'], axis=1)

This gives me

                                 P   Q
Dates                                 
2020-09-07 01:20:24.738686  7175.0  21
2020-09-07 01:45:27.540590  7150.0   7
2020-09-07 03:48:49.120607  7125.0   4
2020-09-07 04:45:50.972042  7125.0   6
2020-09-07 05:36:23.139612  7125.0   2

I then attempt to resample:

def vwap(data):
    price = data.P
    quantity = data.Q

    top = sum(price * quantity)
    bottom = sum(quantity)

    return top / bottom

df2 = df.resample("5h",axis=1).apply(vwap)

This results in the error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

Looking at other stack overflow entries with similar names, their issues are mostly that the datetime only looks like a datetime, but isn't actually formatted as a datetime. This isn't the case here as we can see earlier with that the Dates column has type datetime64[ns]

Further, if I do print(df.index.dtype), I get:

datetime64[ns]

Any suggestions? Happy to clarify anything or provide more code if it would help.

like image 271
F1rools22 Avatar asked Oct 15 '22 00:10

F1rools22


1 Answers

Removing the axis=1 argument and using pd.Grouper works:

df.groupby(pd.Grouper(freq="5h")).apply(vwap)
Dates
2020-09-07 00:00:00    7157.236842
2020-09-07 05:00:00    7125.000000
dtype: float64

If you want a dataframe with an informative column name, use reset_index:

df.groupby(pd.Grouper(freq="5h")).apply(vwap).reset_index(name="vwap")
                Dates         vwap
0 2020-09-07 00:00:00  7157.236842
1 2020-09-07 05:00:00  7125.000000
like image 170
Erfan Avatar answered Oct 19 '22 00:10

Erfan