Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas - efficiently computing minutely returns as columns on intraday data

Tags:

python

pandas

I have a DataFrame that looks like such:

        closingDate                Time   Last
0        1997-09-09 2018-12-13 00:00:00  1000
1        1997-09-09 2018-12-13 00:01:00  1002      
2        1997-09-09 2018-12-13 00:02:00  1001   
3        1997-09-09 2018-12-13 00:03:00  1005

I want to create a DataFrame with roughly 1440 columns labled as timestamps, where the respective daily value is the return over the prior minute:

        closingDate            00:00:00   00:01:00   00:02:00
0        1997-09-09 2018-12-13  -0.08        0.02     -0.001    ...
1        1997-09-10 2018-12-13        ...

My issue is that this is a very large DataFrame (several GB), and I need to do this operation multiple times. Time and memory efficiency is key, but time being more important. Is there some vectorized, built in method to do this in pandas?

like image 351
Évariste Galois Avatar asked Nov 06 '22 23:11

Évariste Galois


1 Answers

You can do this with some aggregation and shifting your time series that should result in more efficient calculations.

First aggregate your data by closingDate.

g = df.groupby("closingDate")

Next you can shift your data to offset by a day.

shifted = g.shift(periods=1)

This will create a new dataframe where the Last value will be from the previous minute. Now you can join to your original dataframe based on the index.

df = df.merge(shifted, left_index=True, right_index=True)

This adds the shifted columns to the new dataframe that you can use to do your difference calculation.

df["Diff"] = (df["Last_x"] - df["Last_y"]) / df["Last_y"]

You now have all the data you're looking for. If you need each minute to be its own column you can pivot the results. By grouping the closingDate and then applying the shift you avoid shifting dates across days. If you look at the first observation of each day you'll get a NaN since the values won't be shifted across separate days.

like image 112
vielkind Avatar answered Nov 14 '22 21:11

vielkind