In quantstrat package I have located one of the main culprits for slowness of the applyRule function and wonder if there is more efficient to write the while loop. Any feedback would be helpful. For anyone experience wrapping this part into Parallel R.
As an option apply would work instead while? Or should I re-write this part into new function such as ruleProc and nextIndex? I am also dveling on Rcpp but that may be a streach. Any help and constructive advice is much appreciated?
while (curIndex) {
timestamp = Dates[curIndex]
if (isTRUE(hold) & holdtill < timestamp) {
hold = FALSE
holdtill = NULL
}
types <- sort(factor(names(strategy$rules), levels = c("pre",
"risk", "order", "rebalance", "exit", "enter", "entry",
"post")))
for (type in types) {
switch(type, pre = {
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules$pre, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
}, risk = {
if (length(strategy$rules$risk) >= 1) {
ruleProc(strategy$rules$risk, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
}, order = {
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules[[type]], timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr,)
} else {
if (isTRUE(path.dep)) {
timespan <- paste("::", timestamp, sep = "")
} else timespan = NULL
ruleOrderProc(portfolio = portfolio, symbol = symbol,
mktdata = mktdata, timespan = timespan)
}
}, rebalance = , exit = , enter = , entry = {
if (isTRUE(hold)) next()
if (type == "exit") {
if (getPosQty(Portfolio = portfolio, Symbol = symbol,
Date = timestamp) == 0) next()
}
if (length(strategy$rules[[type]]) >= 1) {
ruleProc(strategy$rules[[type]], timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
if (isTRUE(path.dep) && length(getOrders(portfolio = portfolio,
symbol = symbol, status = "open", timespan = timestamp,
which.i = TRUE))) {
}
}, post = {
if (length(strategy$rules$post) >= 1) {
ruleProc(strategy$rules$post, timestamp = timestamp,
path.dep = path.dep, mktdata = mktdata, portfolio = portfolio,
symbol = symbol, ruletype = type, mktinstr = mktinstr)
}
})
}
if (isTRUE(path.dep))
curIndex <- nextIndex(curIndex)
else curIndex = FALSE
}
Garrett's answer does point to the last major discussion on the R-SIG-Finance list where a related question was discussed.
The applyRules function in quantstrat is absolutely where most of the time is spent.
The while loop code copied in this question is the path-dependent part of the applyRules execution. I believe all of this is covered in the documentation, but I'll briefly review for SO posterity.
We construct a dimension reduction index inside applyRules so that we don't have to observe every timestamp and check it. We only take note of specific points in time where the strategy may reasonably be expected to act on the order book, or where orders may reasonably be expected to get filled.
This is state-dependent and path-dependent code. Idle talk of 'vectorization' doesn't make any sense in this context. If I need to know the current state of the market, the order book, and my position, and if my orders may be modified in a time-dependent manner by other rules, I don't see how this code can be vectorized.
From a computer science perspective, this is a state machine. State machines in almost every language I can think of are usually written as while loops. This isn't really negotiable or changeable.
The question asks if use of apply would help. apply statements in R are implemented as loops, so no, it wouldn't help. Even a parallel apply such as mclapply or foreach can't help because this is inside a state dependent part of the code. Evaluating different time points without regard to state doesn't make any sense. You'll note that the non-state-dependent parts of quantstrat are vectorized wherever possible, and account for very little of the running time.
The comment made by John suggests removing the for loop in ruleProc. All that the for loop does is check each rule associated with the strategy at this point in time. The only compute-intensive part of that loop is the do.call to call the rule function. The rest of the for loop is simply locating and matching arguments for these functions, and from code profiling, doesn't take much time at all. It would not make much sense to use a parallel apply here either, since the rule functions are applied in type order, so that cancels or risk directives can be applied before new entry directives. Much as mathematics has an order of operations, or a bank has a deposit/withdrawal processing order, quantstrat has a rule type evaluation order, as laid out in the documentation.
To speed up execution, there are four main things that can be done:
I can assure you that most strategies can run in a fraction of a core-minute per symbol per day per core on tick data, if a little care is taken to the signal generation functions. Running large backtests on a laptop is not recommended.
Ref: quantstrat - applyRules
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With