Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataframe cell to be locked and used for a running balance calculation conditional of result on another cell on same row

Say I have the following dataframe:

import pandas as pd
df = pd.DataFrame()

df['A'] = ('1/05/2019','2/05/2019','3/05/2019','4/05/2019','5/05/2019','6/05/2019','7/05/2019','8/05/2019','9/05/2019','10/05/2019','11/05/2019','12/05/2019','13/05/2019','14/05/2019','15/05/2019','16/05/2019','17/05/2019','18/05/2019','19/05/2019','20/05/2019')

df['B'] = ('SIT','SCLOSE', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SHODL', 'SELL','SIT','SIT','BCLOSE', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL', 'BHODL','BUY','SIT','SIT')

df['C'] = (0.00,1.00,10.00, 5.00,6.00,-6.00, 6.00, 0.00,0.00,0.00,-8.00,33.00,-15.00,6.00,-1.00,5.00,10.00,0.00,0.00,0.00)

df.loc[19, 'D'] = 100.0000

As can be seen I am starting column D with 100 at the last row.

I am trying to code a calculation for column D so starting from the bottom row (row 19) when a BUY or SELL is shown on column B then the number on column D is locked (eg the 100) and used for a calculation based on col C for each SHODL or BHODL until the row after a BCLOSE or an SCLOSE is shown.

The locked number is used to calculate a running balance based on the percentages that are in column C. As you can see on row 16 column C has '10' representing 10%. As 10% of 100 = 10 the new runnning balance is 110.

Row 15 column C has 5% as such 5 is added to the running balance to result in 115.

The next row 14 column C has a -1% change as such 1% of 100 is = 1 and therefore the new running balance is 114 and so on.

The following are the results that should be returned in col D of the dataframe once the right code is run

df['D'] = ('158.60','158.60', '157.30', '144.30', '137.80', '130.00', '137.80', '130.00','130.00','130.00','130.00', '138.00', '105.00', '120.00', '114.00', '115.00', '110.00','100.00','100.00','100.00')

This continues until after a SCLOSE or a BCLOSE is shown as a BCLOSE or SCLOSE row is the final row where the running balance is calculated.

As you can see this process is restarted when either a new BUY or SELL is shown.

like image 907
ds882 Avatar asked Feb 28 '20 04:02

ds882


1 Answers

Next starting value depends on the last value of previous group, so I think it can't be vectorized. It requires some kind of iterative process. I came up with solution doing iteratively on groups of groupby. Reverse df and assign to df1. Working on each group of df1 and assign the final list of groups to the original df

df1 = df[::-1]
s = df1.B.isin(['BCLOSE','SCLOSE']).shift(fill_value=False).cumsum()
grps = df1.groupby(s)
init_val= 100
l = []
for _, grp in grps:
    s = grp.C * 0.01 * init_val
    s.iloc[0] = init_val
    s = s.cumsum()
    init_val = s.iloc[-1]
    l.append(s)

df['D'] = pd.concat(l)

Out[50]:
             A       B     C      D
0    1/05/2019     SIT   0.0  158.6
1    2/05/2019  SCLOSE   1.0  158.6
2    3/05/2019   SHODL  10.0  157.3
3    4/05/2019   SHODL   5.0  144.3
4    5/05/2019   SHODL   6.0  137.8
5    6/05/2019   SHODL  -6.0  130.0
6    7/05/2019   SHODL   6.0  137.8
7    8/05/2019    SELL   0.0  130.0
8    9/05/2019     SIT   0.0  130.0
9   10/05/2019     SIT   0.0  130.0
10  11/05/2019  BCLOSE  -8.0  130.0
11  12/05/2019   BHODL  33.0  138.0
12  13/05/2019   BHODL -15.0  105.0
13  14/05/2019   BHODL   6.0  120.0
14  15/05/2019   BHODL  -1.0  114.0
15  16/05/2019   BHODL   5.0  115.0
16  17/05/2019   BHODL  10.0  110.0
17  18/05/2019     BUY   0.0  100.0
18  19/05/2019     SIT   0.0  100.0
19  20/05/2019     SIT   0.0  100.0        
like image 111
Andy L. Avatar answered Nov 09 '22 04:11

Andy L.