Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Groupby cumulative amount

Tags:

python

pandas

I'm trying resample/groupby from a tick list to OHLC non-time series (range bars, volume bars etc).

First Challenge (group by cumulative volume):

Original data DF :

   symbol         utime                time   price  vol   cc   cv ttype  \
id                                                                          
1   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0    5  120  120     R   
2   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0   10  735  120     R   
3   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0   20  735  120     R   
4   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0   30  735    3     R   
5   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0    5  735  147     R   

I need "groupby" the rows until a sum of vol column <= [constant] for example 500, when achieve 500, starts to sum again...

PSEUDO:

vol_amount = 500

adict      = {'open': 'first', 'high':'max', 'low':'min', 'close' : 'last' }

ohlc_vol   = data.groupby(df['vol'].cumsum() <= vol_amount)['price'].agg(adict)

ohlc['ticks_count'] = data.groupby(df['vol'].cumsum() <= vol_amount)['vol'].count()

Thank you for any help!

like image 405
Lucas Avatar asked Mar 10 '17 21:03

Lucas


1 Answers

Consider splitting the cumulative sum of volume by multiples of vol_amount using the integer division operator of the double forward slash, //. Then use that grouping in price aggregations:

vol_amount = 100

data['volcum'] = data['vol'].cumsum()
data['volcumgrp'] = data['volcum'] - ((data['volcum'] // vol_amount) * vol_amount)


adict = {'open': 'first', 'high':'max', 'low':'min', 'close' : 'last'}

ohlc_vol = data.groupby(['volcumgrp'])['price'].agg(adict)
ohlc_vol['ticks_count'] = data.groupby(['volcumgrp'])['vol'].count()

To demonstrate with data using repeated stacks of posted dataframe:

from io import StringIO
import pandas as pd

text = '''
id  symbol         utime                time   price  vol   cc   cv ttype
1   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0    5  120  120     R   
2   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0   10  735  120     R   
3   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0   20  735  120     R   
4   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0   30  735    3     R   
5   DOLX16  1.476961e+09 2016-10-20 09:00:37  3179.0    5  735  147     R
'''

data = pd.concat([pd.read_table(StringIO(text), sep="\s+"),
                  pd.read_table(StringIO(text), sep="\s+"),
                  pd.read_table(StringIO(text), sep="\s+"),
                  pd.read_table(StringIO(text), sep="\s+"),
                  pd.read_table(StringIO(text), sep="\s+")])

# RANDOMIZE PRICE FOR DEMO
from random import randint, seed
seed(a=48)
data['price'] = [float(randint(3175,3199)) for i in range(25)]

# VOLUME CUMULATIVE GROUP
vol_amount = 100
data['volcum'] = data['vol'].cumsum()
data['volcumgrp'] = data['volcum'] - ((data['volcum'] // vol_amount) * vol_amount)
print(data)

# PRICE AGGREGATION
adict = {'open': 'first', 'high':'max', 'low':'min', 'close' : 'last'}
ohlc_vol = data.groupby(['volcumgrp'])['price'].agg(adict)
ohlc_vol['ticks_count'] = data.groupby(['volcumgrp'])['vol'].count()
print(ohlc_vol)

Output

data df (in every 100 groupings)

       id        symbol       utime      time   price  vol   cc   cv ttype  volcum  volcumgrp
1  DOLX16  1.476961e+09  2016-10-20  09:00:37  3192.0    5  120  120     R       5          5
2  DOLX16  1.476961e+09  2016-10-20  09:00:37  3185.0   10  735  120     R      15         15
3  DOLX16  1.476961e+09  2016-10-20  09:00:37  3179.0   20  735  120     R      35         35
4  DOLX16  1.476961e+09  2016-10-20  09:00:37  3192.0   30  735    3     R      65         65
5  DOLX16  1.476961e+09  2016-10-20  09:00:37  3197.0    5  735  147     R      70         70
1  DOLX16  1.476961e+09  2016-10-20  09:00:37  3192.0    5  120  120     R      75         75
2  DOLX16  1.476961e+09  2016-10-20  09:00:37  3184.0   10  735  120     R      85         85
3  DOLX16  1.476961e+09  2016-10-20  09:00:37  3191.0   20  735  120     R     105          5
4  DOLX16  1.476961e+09  2016-10-20  09:00:37  3181.0   30  735    3     R     135         35
5  DOLX16  1.476961e+09  2016-10-20  09:00:37  3197.0    5  735  147     R     140         40
1  DOLX16  1.476961e+09  2016-10-20  09:00:37  3199.0    5  120  120     R     145         45
2  DOLX16  1.476961e+09  2016-10-20  09:00:37  3188.0   10  735  120     R     155         55
3  DOLX16  1.476961e+09  2016-10-20  09:00:37  3180.0   20  735  120     R     175         75
4  DOLX16  1.476961e+09  2016-10-20  09:00:37  3179.0   30  735    3     R     205          5
5  DOLX16  1.476961e+09  2016-10-20  09:00:37  3196.0    5  735  147     R     210         10
1  DOLX16  1.476961e+09  2016-10-20  09:00:37  3178.0    5  120  120     R     215         15
2  DOLX16  1.476961e+09  2016-10-20  09:00:37  3190.0   10  735  120     R     225         25
3  DOLX16  1.476961e+09  2016-10-20  09:00:37  3195.0   20  735  120     R     245         45
4  DOLX16  1.476961e+09  2016-10-20  09:00:37  3182.0   30  735    3     R     275         75
5  DOLX16  1.476961e+09  2016-10-20  09:00:37  3181.0    5  735  147     R     280         80
1  DOLX16  1.476961e+09  2016-10-20  09:00:37  3199.0    5  120  120     R     285         85
2  DOLX16  1.476961e+09  2016-10-20  09:00:37  3191.0   10  735  120     R     295         95
3  DOLX16  1.476961e+09  2016-10-20  09:00:37  3192.0   20  735  120     R     315         15
4  DOLX16  1.476961e+09  2016-10-20  09:00:37  3191.0   30  735    3     R     345         45
5  DOLX16  1.476961e+09  2016-10-20  09:00:37  3179.0    5  735  147     R     350         50

ohlc_vol df

             open     low    high   close  ticks_count
volcumgrp                                             
5          3192.0  3179.0  3192.0  3179.0            3
10         3196.0  3196.0  3196.0  3196.0            1
15         3185.0  3178.0  3192.0  3192.0            3
25         3190.0  3190.0  3190.0  3190.0            1
35         3179.0  3179.0  3181.0  3181.0            2
40         3197.0  3197.0  3197.0  3197.0            1
45         3199.0  3191.0  3199.0  3191.0            3
50         3179.0  3179.0  3179.0  3179.0            1
55         3188.0  3188.0  3188.0  3188.0            1
65         3192.0  3192.0  3192.0  3192.0            1
70         3197.0  3197.0  3197.0  3197.0            1
75         3192.0  3180.0  3192.0  3182.0            3
80         3181.0  3181.0  3181.0  3181.0            1
85         3184.0  3184.0  3199.0  3199.0            2
95         3191.0  3191.0  3191.0  3191.0            1
like image 66
Parfait Avatar answered Sep 28 '22 18:09

Parfait