Rolling pivot table in python

Tags:

I have a df (sample is pasted here at the end). I am looking to find which tradePrice had the most tradeVolume per last 10 min, or any other rolling period.

This is the pivot table done in xls which is based on the sample data attached.

Click to copy

                                      minute                
tradePrice  Data                   0    1   10  Total Result    
12548       Sum - tradeVolume               3   3   
            Count - tradePrice              2   2   
12548.5     Sum - tradeVolume               1   1   
            Count - tradePrice              1   1   
12549       Sum - tradeVolume               1   1   
            Count - tradePrice              1   1   
12549.5     Sum - tradeVolume               95  95  
            Count - tradePrice              5   5   
12550       Sum - tradeVolume               6   6   
            Count - tradePrice              4   4   
12559       Sum - tradeVolume       93          93  
            Count - tradePrice      1           1   
12559.5     Sum - tradeVolume           1       1   
            Count - tradePrice          1       1   
12560       Sum - tradeVolume           5       5   
            Count - tradePrice          4       4   
12560.5     Sum - tradeVolume       3   5       8   
            Count - tradePrice     3    2       5   
12561       Sum - tradeVolume      4    5       9   
            Count - tradePrice     2    3       5   
12561.5     Sum - tradeVolume      3    2       5   
            Count - tradePrice     3    1       4   
12562       Sum - tradeVolume      9    7       16  
            Count - tradePrice     8    1       9   
12562.5     Sum - tradeVolume       6   2       8   
            Count - tradePrice     2    2       4   
12563       Sum - tradeVolume      2            2   
            Count - tradePrice     1            1   
Total Sum - tradeVolume     120 27  106 253 
Total Count - tradePrice    20  14  13  47

The result need to be a df something like this searching for the price with the most volume traded):

Click to copy

    Price   Volume
02:00:00 AM 12559   93
02:01:00 AM 12562   7
02:10:00 AM 12549.5 95

In order to get 1 min. results I groupby & applied the following function

Click to copy

def f(x):  # function to find the POC price and volume
    a = x['tradePrice'].value_counts().index[0]
    b = x.loc[x['tradePrice'] == a, 'tradeVolume'].sum()
    return pd.Series([a, b], ['POC_Price', 'POC_Volume'])

groupbytime = (str(Time)+"min")#ther is a column name by this
groups = df.groupby(groupbytime,as_index=True)
df_POC = groups.apply(f)  #applys the function of the POC on the grouped data

My question is: How can I get the same solution but per rolling time period (which can not be less than 1 min) so the expected result (which price was traded with the max volume) for the last 10 minutes is:

Click to copy

    Price   Volume
02:10:00 AM 12549.5 95

thanks in advance!

The sample data:

Click to copy

    dateTime    tradePrice  tradeVolume 1min    time_of_day_10  time_of_day_30  date    hour    minute
0   2017-09-19 02:00:04 12559   93  2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
49  2017-09-19 02:00:11 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
50  2017-09-19 02:00:12 12563   2   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
51  2017-09-19 02:00:12 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
122 2017-09-19 02:00:34 12561.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
123 2017-09-19 02:00:34 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
127 2017-09-19 02:00:34 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
129 2017-09-19 02:00:35 12561   2   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
130 2017-09-19 02:00:35 12560.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
131 2017-09-19 02:00:35 12561.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
135 2017-09-19 02:00:39 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
136 2017-09-19 02:00:39 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
137 2017-09-19 02:00:43 12561.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
138 2017-09-19 02:00:43 12561   2   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
139 2017-09-19 02:00:43 12560.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
140 2017-09-19 02:00:43 12560.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
152 2017-09-19 02:00:45 12562   2   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
153 2017-09-19 02:00:46 12562.5 4   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
166 2017-09-19 02:00:58 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
167 2017-09-19 02:00:58 12562.5 2   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
168 2017-09-19 02:01:00 12562   7   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
169 2017-09-19 02:01:00 12562.5 1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
170 2017-09-19 02:01:00 12562.5 1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
171 2017-09-19 02:01:00 12561.5 2   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
175 2017-09-19 02:01:03 12561   1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
176 2017-09-19 02:01:03 12561   3   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
187 2017-09-19 02:01:07 12560.5 2   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
188 2017-09-19 02:01:08 12561   1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
189 2017-09-19 02:01:10 12560   1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
190 2017-09-19 02:01:10 12560   1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
191 2017-09-19 02:01:10 12559.5 1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
192 2017-09-19 02:01:11 12560   1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
193 2017-09-19 02:01:12 12560   2   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
194 2017-09-19 02:01:12 12560.5 3   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
593 2017-09-19 02:10:00 12550   1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
594 2017-09-19 02:10:00 12549.5 12  2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
604 2017-09-19 02:10:12 12548.5 1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
605 2017-09-19 02:10:15 12549.5 22  2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
606 2017-09-19 02:10:16 12549.5 21  2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
636 2017-09-19 02:10:45 12548   1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
637 2017-09-19 02:10:47 12548   2   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
638 2017-09-19 02:10:47 12549.5 23  2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
639 2017-09-19 02:10:48 12549.5 17  2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
640 2017-09-19 02:10:49 12549   1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
665 2017-09-19 02:10:58 12550   1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
666 2017-09-19 02:10:58 12550   1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
667 2017-09-19 02:10:58 12550   3   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10

730

asked Mar 23 '18 13:03

Giladbi

1 Answers

If i understand your problem correctly you need to chose a time granuality and a time window. Then you can do this with a combination of groupby + unstack + rolling.

First the groupby:

Click to copy

time_grain = '1min'
df = df.groupby([pd.Grouper(key='dateTime', freq=time_grain),'tradePrice']).tradeVolume.sum()
dateTime             tradePrice
2017-09-19 02:00:00  12559.0       93
                     12560.5        3
                     12561.0        4
                     12561.5        3
                     12562.0        9
                     12562.5        6
                     12563.0        2
2017-09-19 02:01:00  12559.5        1
                     12560.0        5
                     12560.5        5
                     12561.0        5
                     12561.5        2
                     12562.0        7
                     12562.5        2
2017-09-19 02:10:00  12548.0        3
                     12548.5        1
                     12549.0        1
                     12549.5       95
                     12550.0        6
Name: tradeVolume, dtype: int64

Then unstack + rolling:

Click to copy

window_size = '10min'
df = df.unstack('tradePrice').rolling(window_size).sum()
tradePrice           12548.0  12548.5  12549.0  12549.5  12550.0  12559.0  \
dateTime                                                                    
2017-09-19 02:00:00      NaN      NaN      NaN      NaN      NaN     93.0   
2017-09-19 02:01:00      NaN      NaN      NaN      NaN      NaN     93.0   
2017-09-19 02:10:00      3.0      1.0      1.0     95.0      6.0      NaN   

tradePrice           12559.5  12560.0  12560.5  12561.0  12561.5  12562.0  \
dateTime                                                                    
2017-09-19 02:00:00      NaN      NaN      3.0      4.0      3.0      9.0   
2017-09-19 02:01:00      1.0      5.0      8.0      9.0      5.0     16.0   
2017-09-19 02:10:00      1.0      5.0      5.0      5.0      2.0      7.0   

tradePrice           12562.5  12563.0  
dateTime                               
2017-09-19 02:00:00      6.0      2.0  
2017-09-19 02:01:00      8.0      2.0  
2017-09-19 02:10:00      2.0      NaN

Finally stack back tradePrice into the index and find the index with highest value for every time period:

Click to copy

df = df.stack('tradePrice')
idx_list = df.groupby('dateTime').idxmax()
result = df.loc[idx_list]
dateTime             tradePrice
2017-09-19 02:00:00  12559.0       93.0
2017-09-19 02:01:00  12559.0       93.0
2017-09-19 02:10:00  12549.5       95.0
dtype: float64

Note that rolling defaults to a min number of observations of 1 if you use it with a time offset. Thats why you get 3 result rows.

I think the biggest drawback of this approach is that for a big dataframe with a high number of price points this will take a lot of memory (Because for every price point a new column is generated).

130

answered Oct 03 '22 09:10

P.Tillmann

Related questions
                            
                                How to serve multiple domains which share the application back-end in Flask?
                            
                                Difference between conda and pip installs within a conda environment
                            
                                Can you change kernel in colab notebook?
                            
                                how to enable auto scaling with Celery multi workers?
                            
                                How to screenshot a specified WebElement in Selenium using Python
                            
                                pickle.load - EOFError: Ran out of input
                            
                                Safely running code in a process, redirect stdout in multithreading.Process
                            
                                Indexing numpy array by a numpy array of coordinates
                            
                                Python: Multiple conditions for if statement in Jinja templates
                            
                                Confusion matrix for Clustering in scikit-learn
                            
                                How to detect rectangular items in image with Python
                            
                                How to cleanly shutdown Change Streams with Motor?
                            
                                Simplest lambda function to copy a file from one s3 bucket to another
                            
                                Dropout behavior in Keras with rate=1 (dropping all input units) not as expected
                            
                                Python - Why are classes listed in the list of built-in functions?
                            
                                Should we put all required and their dependent packages on requirement.txt or only required packages
                            
                                Pyinstaller doesn't work properly with threading
                            
                                mape eval metric in xgboost
                            
                                Why Python loops over slices of numpy arrays are faster than fully vectorized operations
                            
                                How does batching work in a seq2seq model in pytorch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Rolling pivot table in python

Tags:

python

pandas

pandas-groupby

Giladbi

People also ask

1 Answers

P.Tillmann

Recent Activity

Donate For Us