Update: not sure if this is possible without some form of a loop, but np.where will not work here. If the answer is, "you can't", then so be it. If it can be done, it may use something from scipy.signal.
I'd like to vectorize the loop in the code below, but unsure as to how, due to the recursive nature of the output.
Walk-though of my current setup:
Take a starting amount ($1 million) and a quarterly dollar distribution ($5,000):
dist = 5000.
v0 = float(1e6)
Generate some random security/account returns (decimal form) at monthly freq:
r = pd.Series(np.random.rand(12) * .01,
index=pd.date_range('2017', freq='M', periods=12))
Create an empty Series that will hold the monthly account values:
value = pd.Series(np.empty_like(r), index=r.index)
Add a "start month" to value. This label will contain v0.
from pandas.tseries import offsets
value = (value.append(Series(v0, index=[value.index[0] - offsets.MonthEnd(1)]))
.sort_index())
The loop I'd like to get rid of is here:
for date in value.index[1:]:
if date.is_quarter_end:
value.loc[date] = value.loc[date - offsets.MonthEnd(1)] \
* (1 + r.loc[date]) - dist
else:
value.loc[date] = value.loc[date - offsets.MonthEnd(1)] \
* (1 + r.loc[date])
Combined code:
import pandas as pd
from pandas.tseries import offsets
from pandas import Series
import numpy as np
dist = 5000.
v0 = float(1e6)
r = pd.Series(np.random.rand(12) * .01, index=pd.date_range('2017', freq='M', periods=12))
value = pd.Series(np.empty_like(r), index=r.index)
value = (value.append(Series(v0, index=[value.index[0] - offsets.MonthEnd(1)])).sort_index())
for date in value.index[1:]:
if date.is_quarter_end:
value.loc[date] = value.loc[date - offsets.MonthEnd(1)] * (1 + r.loc[date]) - dist
else:
value.loc[date] = value.loc[date - offsets.MonthEnd(1)] * (1 + r.loc[date])
In psuedocode, what is loop is doing is just:
for each date in index of value:
if the date is not a quarter end:
multiply previous value by (1 + r) for that month
if the date is a quarter end:
multiply previous value by (1 + r) for that month and subtract dist
The issue is, I don't currently see how vectorization is possible since the successive value depends on whether or not a distribution was taken in the month prior. I get to the desired result, but pretty inefficiently for higher frequency data or larger time periods.

You could use the following code:
cum_r = (1 + r).cumprod()
result = cum_r * v0
for date in r.index[r.index.is_quarter_end]:
result[date:] -= cum_r[date:] * (dist / cum_r.loc[date])
You would make:
v0
n vector multiplication with scalar dist / cum_r.loc[date]
n vector subtractionswhere n is the number of quarter ends.
Based on this code we can optimize further:
cum_r = (1 + r).cumprod()
t = (r.index.is_quarter_end / cum_r).cumsum()
result = cum_r * (v0 - dist * t)
which is
(1 + r).cumprod()
r.index.is_quarter_end / cum_r
dist
v0 with dist * t
cum_r with v0 - dist * t
Ok... I'm taking a stab at this.
import numpy as np
import pandas as pd
#Define a generator for accumulating deposits and returns
def gen(lst):
acu = 0
for r, v in lst:
yield acu * (1 + r) +v
acu *= (1 + r)
acu += v
dist = 5000.
v0 = float(1e6)
random_returns = np.random.rand(12) * 0.1
#Create the index.
index=pd.date_range('2016-12-31', freq='M', periods=13)
#Generate a return so that the value at i equals the return from i-1 to i
r = pd.Series(np.insert(random_returns, 0,0), index=index, name='Return')
#Generate series with deposits and withdrawals
w = [-dist if is_q_end else 0 for is_q_end in index [1:].is_quarter_end]
d = pd.Series(np.insert(w, 0, v0), index=index, name='Movements')
df = pd.concat([r, d], axis=1)
df['Value'] = list(gen(zip(df['Return'], df['Movements'])))
now, your code
#Generate some random security/account returns (decimal form) at monthly freq:
r = pd.Series(random_returns,
index=pd.date_range('2017', freq='M', periods=12))
#Create an empty Series that will hold the monthly account values:
value = pd.Series(np.empty_like(r), index=r.index)
#Add a "start month" to value. This label will contain v0.
from pandas.tseries import offsets
value = (value.append(pd.Series(v0, index=[value.index[0] - offsets.MonthEnd(1)])).sort_index())
#The loop I'd like to get rid of is here:
def loopy(value) :
for date in value.index[1:]:
if date.is_quarter_end:
value.loc[date] = value.loc[date - offsets.MonthEnd(1)] \
* (1 + r.loc[date]) - dist
else:
value.loc[date] = value.loc[date - offsets.MonthEnd(1)] \
* (1 + r.loc[date])
return value
and comparing and timing
(loopy(value)==list(gen(zip(r, d)))).all()
Out[11]: True
returns same result
%timeit list(gen(zip(r, d)))
%timeit loopy(value)
10000 loops, best of 3: 72.4 µs per loop
100 loops, best of 3: 5.37 ms per loop
and appears to be somewhat faster. Hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With