Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - How to Speed Up Recursion and Double Summation

Tags:

r

recursion

Since this is essentially a question about how to efficiently perform a computation in R, I will start with the equation and then provide an explanation for the problem after the code for those who would find it useful or interesting.

I have written a script in R to generate values using the following function:

Success probability when dice explode

The function, as you can see, is recursive and involves double summation. It works well for small numbers around 15 or lower, but the execution time gets prohibitively long at higher values of n and t. I need to be able to perform the calculation for every n and t pair from 1 to 30. Is there a way to write a script that won't take months to execute?

My current script is:

explProb <- function(n,t) {
    prob <- 0

    #################################
    # FIRST PART - SINGLE SUMMATION
    #################################
    i <- 0
    if(t<=n) {
        i <- c(t:n)
    }
    prob = sum(choose(n,i[i>0])*((1/3)^(i[i>0]))*((2/3)^(n-i[i>0])))

    #################################
    # SECOND PART - DOUBLE SUMMATION
    #################################
    if(t >= 2) {
        for(k in 1:(t-1)) {
            j <- c(0:(k-1))
            prob = prob + sum(choose(n,n-k)*((1/6)^(j))*((1/6)^(k-j))*((2/3)^(n-k))*explProb(k-j,t-k))
        }
    }

    return(prob)
}
MAX_DICE = 30
MAX_THRESHOLD = 30
probabilities = matrix(0,MAX_DICE,MAX_THRESHOLD)

for(dice in 1:MAX_DICE) {
    for(threshold in 1:MAX_THRESHOLD) {
        #print(sprintf("DICE = %d : THRESH = %d", dice, threshold))
        probabilities[dice,threshold] = explProb(dice,threshold)
    }
}

I am trying to write a script to generate a set of probabilities for a particular type of dice roll in a tabletop roleplaying game (Shadowrun 5th Edition, to be specific). The type of dice roll is called an "Exploding Dice Roll". In case you are not familiar with how these rolls work in this game, let me briefly explain.

Whenever you try to accomplish a task you make a test by rolling a number of six-sided dice. Your goal is to get a predetermined number "hits" when rolling those dice. A "hit" is defined as a 5 or 6 on a six-sided die. So, for example, if you have a dice pool of 5 dice, and you roll: 1, 3, 3, 5, 6 then you have gotten 2 hits.

In some cases you are allowed to re-roll all of the 6's that were rolled in order to try and get MORE hits.This is called an "exploding" roll. The 6's counts as hits, but can be re-rolled to "explode" into even more hits. For clarification I'll give a quick example...

If you roll 10 dice and get a result of 1, 2, 2, 4, 5, 5, 6, 6, 6, 6 then you have gotten 6 hits on the first roll... However, the 4 dice that rolled 6's can be re-rolled again. If you roll those dice and get 3, 5, 6, 6 then you have 3 more hits for a total of 9 hits. But you can now re-roll the two more sixes you got... etc... You keep re-rolling the sixes, adding the 5's and 6's to your total hits, and keep going until you get a roll with no sixes.

The function listed above generates these probabilities taking an input of "# of dice" and "number of hits" (called a "threshold" here).

n = # of Dice being rolled t = Threshold number of "hits" to be reached

like image 634
Funkytown Avatar asked Aug 25 '15 13:08

Funkytown


1 Answers

Efficient Calculation Using Probability Distributions

The approach in the question and in my other answer use sums over transitions of dependent binomial distributions. The dependency arising from the carry over of previous successes (hits) to subsequent trials (rolls) complicates the calculations.

An alternative approach is to view each die separately. Roll a single die as long as it turns up as a hit. Each die is independent of the other, so the random variables may be summed efficiently through convolution. However, the distribution for each die is a geometric distribution, and the sum of independent geometric distributions gives rise to a negative binomial distribution.

R provides the negative binomial distribution, so the results obtained in my other answer may be had all at once by

round(dnbinom(0:19,10,prob=2/3),4)
 [1] 0.0173 0.0578 0.1060 0.1413 0.1531 0.1429 0.1191 0.0907 0.0643 0.0428
[11] 0.0271 0.0164 0.0096 0.0054 0.0030 0.0016 0.0008 0.0004 0.0002 0.0001

The probability matrix in the question, with MAX_DICE=MAX_THRESHOLD=10, has first column equal to

1-dnbinom(0,1:10,prob=2/3)

So, you might be looking for the cumulative distribution function. I have not been able to figure out your intentions with the subsequent columns, but perhaps the goal was

outer(1:10,0:10,function(size,x) 1-dnbinom(x,size,prob=2/3))
like image 148
A. Webb Avatar answered Nov 10 '22 01:11

A. Webb