Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating cumulative returns with pandas dataframe

I have this dataframe

Poloniex_DOGE_BTC   Poloniex_XMR_BTC    Daily_rets  perc_ret
172 0.006085    -0.000839   0.003309    0
173 0.006229    0.002111    0.005135    0
174 0.000000    -0.001651   0.004203    0
175 0.000000    0.007743    0.005313    0
176 0.000000    -0.001013   -0.003466   0
177 0.000000    -0.000550   0.000772    0
178 0.000000    -0.009864   0.001764    0

I'm trying to make a running total of daily_rets in perc_ret

however my code just copies the values from daily_rets

df['perc_ret'] = (  df['Daily_rets'] + df['perc_ret'].shift(1) )


Poloniex_DOGE_BTC   Poloniex_XMR_BTC    Daily_rets  perc_ret
172 0.006085    -0.000839   0.003309    NaN
173 0.006229    0.002111    0.005135    0.005135
174 0.000000    -0.001651   0.004203    0.004203
175 0.000000    0.007743    0.005313    0.005313
176 0.000000    -0.001013   -0.003466   -0.003466
177 0.000000    -0.000550   0.000772    0.000772
178 0.000000    -0.009864   0.001764    0.001764
like image 817
David Hancock Avatar asked Feb 12 '16 14:02

David Hancock


People also ask

How do you calculate cumulative returns?

Cumulative return for the two sub-periods is calculated by multiplying each sub-period's return. Cumulative return = (1+20%)*(1+50%) - 1 = 80%

How do you calculate cumulative percentage in pandas?

Cumulative Percentage is calculated by the mathematical formula of dividing the cumulative sum of the column by the mathematical sum of all the values and then multiplying the result by 100. This is also applicable in Pandas Data frames.


3 Answers

If they are daily simple returns and you want a cumulative return, surely you must want a daily compounded number?

df['perc_ret'] = (1 + df.Daily_rets).cumprod() - 1  
# Or:
# df.Daily_rets.add(1).cumprod().sub(1)

>>> df
     Poloniex_DOGE_BTC  Poloniex_XMR_BTC  Daily_rets  perc_ret
172           0.006085         -0.000839    0.003309  0.003309
173           0.006229          0.002111    0.005135  0.008461
174           0.000000         -0.001651    0.004203  0.012700
175           0.000000          0.007743    0.005313  0.018080
176           0.000000         -0.001013   -0.003466  0.014551
177           0.000000         -0.000550    0.000772  0.015335
178           0.000000         -0.009864    0.001764  0.017126

If they are log returns, then you could just use cumsum.

like image 64
Alexander Avatar answered Oct 18 '22 21:10

Alexander


you just cannot simply add them all by using cumsum

for example, if you have array [1.1, 1.1], you supposed to have 2.21, not 2.2

import numpy as np

# daily return:
df['daily_return'] = df['close'].pct_change()

# calculate cumluative return
df['cumluative_return'] = np.exp(np.log1p(df['daily_return']).cumsum())
like image 5
Dong Yi Avatar answered Oct 18 '22 22:10

Dong Yi


If performance is important, use numpy.cumprod:

np.cumprod(1 + df['Daily_rets'].values) - 1

Timings:

#7k rows
df = pd.concat([df] * 1000, ignore_index=True)

In [191]: %timeit np.cumprod(1 + df['Daily_rets'].values) - 1
41 µs ± 282 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [192]: %timeit (1 + df.Daily_rets).cumprod() - 1
554 µs ± 3.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
like image 4
jezrael Avatar answered Oct 18 '22 22:10

jezrael