Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling pivot table in python

I have a df (sample is pasted here at the end). I am looking to find which tradePrice had the most tradeVolume per last 10 min, or any other rolling period.

This is the pivot table done in xls which is based on the sample data attached.

                                      minute                
tradePrice  Data                   0    1   10  Total Result    
12548       Sum - tradeVolume               3   3   
            Count - tradePrice              2   2   
12548.5     Sum - tradeVolume               1   1   
            Count - tradePrice              1   1   
12549       Sum - tradeVolume               1   1   
            Count - tradePrice              1   1   
12549.5     Sum - tradeVolume               95  95  
            Count - tradePrice              5   5   
12550       Sum - tradeVolume               6   6   
            Count - tradePrice              4   4   
12559       Sum - tradeVolume       93          93  
            Count - tradePrice      1           1   
12559.5     Sum - tradeVolume           1       1   
            Count - tradePrice          1       1   
12560       Sum - tradeVolume           5       5   
            Count - tradePrice          4       4   
12560.5     Sum - tradeVolume       3   5       8   
            Count - tradePrice     3    2       5   
12561       Sum - tradeVolume      4    5       9   
            Count - tradePrice     2    3       5   
12561.5     Sum - tradeVolume      3    2       5   
            Count - tradePrice     3    1       4   
12562       Sum - tradeVolume      9    7       16  
            Count - tradePrice     8    1       9   
12562.5     Sum - tradeVolume       6   2       8   
            Count - tradePrice     2    2       4   
12563       Sum - tradeVolume      2            2   
            Count - tradePrice     1            1   
Total Sum - tradeVolume     120 27  106 253 
Total Count - tradePrice    20  14  13  47  

The result need to be a df something like this searching for the price with the most volume traded):

    Price   Volume
02:00:00 AM 12559   93
02:01:00 AM 12562   7
02:10:00 AM 12549.5 95

In order to get 1 min. results I groupby & applied the following function

def f(x):  # function to find the POC price and volume
    a = x['tradePrice'].value_counts().index[0]
    b = x.loc[x['tradePrice'] == a, 'tradeVolume'].sum()
    return pd.Series([a, b], ['POC_Price', 'POC_Volume'])

groupbytime = (str(Time)+"min")#ther is a column name by this
groups = df.groupby(groupbytime,as_index=True)
df_POC = groups.apply(f)  #applys the function of the POC on the grouped data                                                                  

My question is: How can I get the same solution but per rolling time period (which can not be less than 1 min) so the expected result (which price was traded with the max volume) for the last 10 minutes is:

    Price   Volume
02:10:00 AM 12549.5 95

thanks in advance!

The sample data:

    dateTime    tradePrice  tradeVolume 1min    time_of_day_10  time_of_day_30  date    hour    minute
0   2017-09-19 02:00:04 12559   93  2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
49  2017-09-19 02:00:11 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
50  2017-09-19 02:00:12 12563   2   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
51  2017-09-19 02:00:12 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
122 2017-09-19 02:00:34 12561.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
123 2017-09-19 02:00:34 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
127 2017-09-19 02:00:34 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
129 2017-09-19 02:00:35 12561   2   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
130 2017-09-19 02:00:35 12560.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
131 2017-09-19 02:00:35 12561.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
135 2017-09-19 02:00:39 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
136 2017-09-19 02:00:39 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
137 2017-09-19 02:00:43 12561.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
138 2017-09-19 02:00:43 12561   2   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
139 2017-09-19 02:00:43 12560.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
140 2017-09-19 02:00:43 12560.5 1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
152 2017-09-19 02:00:45 12562   2   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
153 2017-09-19 02:00:46 12562.5 4   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
166 2017-09-19 02:00:58 12562   1   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
167 2017-09-19 02:00:58 12562.5 2   2017-09-19 02:00:00 02:00:00    02:00:00    2017-09-19  2   0
168 2017-09-19 02:01:00 12562   7   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
169 2017-09-19 02:01:00 12562.5 1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
170 2017-09-19 02:01:00 12562.5 1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
171 2017-09-19 02:01:00 12561.5 2   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
175 2017-09-19 02:01:03 12561   1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
176 2017-09-19 02:01:03 12561   3   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
187 2017-09-19 02:01:07 12560.5 2   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
188 2017-09-19 02:01:08 12561   1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
189 2017-09-19 02:01:10 12560   1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
190 2017-09-19 02:01:10 12560   1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
191 2017-09-19 02:01:10 12559.5 1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
192 2017-09-19 02:01:11 12560   1   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
193 2017-09-19 02:01:12 12560   2   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
194 2017-09-19 02:01:12 12560.5 3   2017-09-19 02:01:00 02:00:00    02:00:00    2017-09-19  2   1
593 2017-09-19 02:10:00 12550   1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
594 2017-09-19 02:10:00 12549.5 12  2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
604 2017-09-19 02:10:12 12548.5 1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
605 2017-09-19 02:10:15 12549.5 22  2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
606 2017-09-19 02:10:16 12549.5 21  2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
636 2017-09-19 02:10:45 12548   1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
637 2017-09-19 02:10:47 12548   2   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
638 2017-09-19 02:10:47 12549.5 23  2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
639 2017-09-19 02:10:48 12549.5 17  2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
640 2017-09-19 02:10:49 12549   1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
665 2017-09-19 02:10:58 12550   1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
666 2017-09-19 02:10:58 12550   1   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
667 2017-09-19 02:10:58 12550   3   2017-09-19 02:10:00 02:10:00    02:00:00    2017-09-19  2   10
like image 730
Giladbi Avatar asked Mar 23 '18 13:03

Giladbi


People also ask

What is rolling method in Python?

The rolling() function is used to provide rolling window calculations. Syntax: Series.rolling(self, window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None) Parameters: Name.

How do you create a pivot table in Python?

Creating a pivot table. By specifying the index and columns parameters in the pd. pivot_table() function, you can determine which features should appear in the columns and rows. In the values parameter, you should specify which feature should be used to fill in the cell values.

How does rolling work in pandas?

Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.

What is pivot_table in pandas?

Pandas DataFrame: pivot_table() function The pivot_table() function is used to create a spreadsheet-style pivot table as a DataFrame. The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.


1 Answers

If i understand your problem correctly you need to chose a time granuality and a time window. Then you can do this with a combination of groupby + unstack + rolling.

First the groupby:

time_grain = '1min'
df = df.groupby([pd.Grouper(key='dateTime', freq=time_grain),'tradePrice']).tradeVolume.sum()
dateTime             tradePrice
2017-09-19 02:00:00  12559.0       93
                     12560.5        3
                     12561.0        4
                     12561.5        3
                     12562.0        9
                     12562.5        6
                     12563.0        2
2017-09-19 02:01:00  12559.5        1
                     12560.0        5
                     12560.5        5
                     12561.0        5
                     12561.5        2
                     12562.0        7
                     12562.5        2
2017-09-19 02:10:00  12548.0        3
                     12548.5        1
                     12549.0        1
                     12549.5       95
                     12550.0        6
Name: tradeVolume, dtype: int64

Then unstack + rolling:

window_size = '10min'
df = df.unstack('tradePrice').rolling(window_size).sum()
tradePrice           12548.0  12548.5  12549.0  12549.5  12550.0  12559.0  \
dateTime                                                                    
2017-09-19 02:00:00      NaN      NaN      NaN      NaN      NaN     93.0   
2017-09-19 02:01:00      NaN      NaN      NaN      NaN      NaN     93.0   
2017-09-19 02:10:00      3.0      1.0      1.0     95.0      6.0      NaN   

tradePrice           12559.5  12560.0  12560.5  12561.0  12561.5  12562.0  \
dateTime                                                                    
2017-09-19 02:00:00      NaN      NaN      3.0      4.0      3.0      9.0   
2017-09-19 02:01:00      1.0      5.0      8.0      9.0      5.0     16.0   
2017-09-19 02:10:00      1.0      5.0      5.0      5.0      2.0      7.0   

tradePrice           12562.5  12563.0  
dateTime                               
2017-09-19 02:00:00      6.0      2.0  
2017-09-19 02:01:00      8.0      2.0  
2017-09-19 02:10:00      2.0      NaN

Finally stack back tradePrice into the index and find the index with highest value for every time period:

df = df.stack('tradePrice')
idx_list = df.groupby('dateTime').idxmax()
result = df.loc[idx_list]
dateTime             tradePrice
2017-09-19 02:00:00  12559.0       93.0
2017-09-19 02:01:00  12559.0       93.0
2017-09-19 02:10:00  12549.5       95.0
dtype: float64

Note that rolling defaults to a min number of observations of 1 if you use it with a time offset. Thats why you get 3 result rows.

I think the biggest drawback of this approach is that for a big dataframe with a high number of price points this will take a lot of memory (Because for every price point a new column is generated).

like image 130
P.Tillmann Avatar answered Oct 03 '22 09:10

P.Tillmann