Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Excel UDF weighted RANDBETWEEN()

Well not really RANDBETWEEN(). I'm trying to create a UDF to return the index of a number within an array, where the larger the number the more likely it is to be chosen.

I know how to assign probabilities to random numbers in a worksheet (i.e. using MATCH() on the sum of the probabilities, there's plenty of stuff on SO explaining that), but I want a UDF because I'm passing a special input array into the function - not just a selected range.

My issue is, the weighting is off, numbers later on in the array are more likely to be returned than those earlier in the array and I can't see where in my code I've gone wrong. Here's the UDF so far:

Public Function PROBABLE(ParamArray inputArray() As Variant) As Long
'Takes a set of relative or absolute probabilities and ranks a random number within them
Application.Volatile (True)
Dim outputArray() As Variant
Dim scalar As Single
Dim rankNum As Single
Dim runningTot As Single

'''''
'Here I take inputArray() and convert to outputArray(), 
'which is fed into the probability code below
'''''

scalar = 1 / WorksheetFunction.Sum(outputArray)
rankNum = Rnd()
runningTot = 0

For i = 0 To UBound(outputArray)
    runningTot = runningTot + outputArray(i)
    If runningTot * scalar >= rankNum Then
        PROBABLE = i + 1
        Exit Function
    End If
Next i

End Function

The function should look at the relative sizes of the numbers in outputArray() and pick randomly but weighted towards the larger numbers. E.g. outputArray() of {1,0,0,1} should assign probabilities respectively of {50%,0%,0%,50%} However when I tested that outputArray(), for 1000 samples and 100 iterations, and graphed how frequently item 1 or item 4 in the array was returned, I got this result:Graph

Approximately 20%:80% distribution. Plotting {1,1,1,1} (all should have equal chance) gave a 10%:20%:30%:40% distribution

I know I'm missing something obvious but I can't tell what, any help?

UPDATE

Some people were asking for the complete code, here it is.

Public Function PROBABLE(ParamArray inputArray() As Variant) As Long
'Takes a set of relative or absolute probabilities and ranks a random number within them
Application.Volatile (True) 'added some dimensions up here
Dim outputArray() As Variant
Dim inElement As Variant
Dim subcell As Variant
Dim scalar As Single
Dim rankNum As Single
Dim runningTot As Single
'convert ranges to values
'creating a new array from the mixture of ranges and values in the input array
''''
'This is where I create outputArray() from inputArray()
''''
ReDim outputArray(0)
For Each inElement In inputArray
'Normal values get copied from the input UDF to an output array, ranges get split up then appended
    If TypeName(inElement) = "Range" Or TypeName(inElement) = "Variant()" Then
        For Each subcell In inElement
            outputArray(UBound(outputArray)) = subcell
            ReDim Preserve outputArray(UBound(outputArray) + 1)
        Next subcell
    'Stick the element on the end of an output array
    Else
        outputArray(UBound(outputArray)) = inElement
        ReDim Preserve outputArray(UBound(outputArray) + 1)
    End If
Next inElement
ReDim Preserve outputArray(UBound(outputArray) - 1)
''''
'End of new code, the rest is as before
''''
scalar = 1 / WorksheetFunction.Sum(outputArray)
rankNum = Rnd()
runningTot = 0

For i = 0 To UBound(outputArray)
    runningTot = runningTot + outputArray(i)
    If runningTot * scalar >= rankNum Then
        PROBABLE = i + 1
        Exit Function
    End If
Next i

End Function

The start inputArray() 🡒 outputArray() section is used to standardise different input methods. I.e. the user can enter a mixture of values, cell references/ranges and arrays, and the function can cope. e.g. {=PROBABLE(A1,5,B1:C15,IF(ISTEXT(D1:D3),LEN(D1:D3),0))} (you get the picture) should work just as well as =PROBABLE(A1:A3). I cycle through the sub-elements of the inputArray() and put them in my outputArray(). I'm fairly certain there's nothing wrong with this portion of code.

Then to get my results, I copied the UDF into A1:A1000, used a COUNTIF(A1:A1000,1)or instead of count 1, I did count 2, 3, 4 etc for each of the possible UDF outputs and made a short macro to recalculate the sheet 100 times, each time copying the result of the countif into a table to graph. I can't say precisely how I did that because I left this all at work, but I'll update on Monday.

like image 453
Greedo Avatar asked Feb 10 '17 12:02

Greedo


3 Answers

Try this:

Function Probable(v As Variant) As Long
    Application.Volatile 'remove this if you don't want a volatile function

    Dim v2 As Variant
    ReDim v2(LBound(v) To UBound(v) + 1)

    v2(LBound(v2)) = 0
    Dim i As Integer
    For i = LBound(v) To UBound(v)
        v2(i + 1) = v2(i) + v(i) / Application.Sum(v)
    Next i

    Probable = Application.WorksheetFunction.Match(Rnd(), v2, 1)
End Function

The array v is essentially your outputArray.

The code takes an array like {1,0,0,1} and converts it to {0,0.5,0.5,1} (note the 0 at the beginning) at which point you can do a MATCH as you suggested to get to get either a 1 or 4 with equal probability.

Similarly, if you were to start with {1,1,1,1} it would be converted to {0,0.25,0.5,0.75,1} and return any of 1, 2, 3 or 4 with equal probability.

Also note: you could probably make it a bit quicker if you save the value of Application.Sum(v) in a variable rather than performing the calculation for every value in array v.

Update
The function now takesv as a parameter -- like your code. I also tweaked it a bit so that it can deal with v having any base, which means you can run it from the worksheet too: =Probable({1,0,0,1}) for example

like image 142
CallumDA Avatar answered Oct 21 '22 06:10

CallumDA


It appears I have made a tragic mistake. My code was fine, my counting wasn't so good. I was using SUMIF() instead of COUNTIF() in my graphing, resulting in later objects in the array (with a higher Index - the output of the UDF which I was supposed to be counting but was instead summing) getting a weighting proportional to their position.

In retrospect, I think someone far more clever than I could probably have deduced that from the information given. I said {1,1,1,1} has a {10%:20%:30%:40%},that's a {1:2:3:4} ratio, which is precisely the same ratio as the indices of the outputs, deduction: the outputs were summed not counted.

Similarly, the graph of {1,0,0,1} with a {20%:0%:0%:80%} output, well divide each percentage by it's index (20%/1, 80%/4) and Hey Presto {20%:0%:0%:20%}, or the 1:1 ratio I had expected.

Something annoying but satisfying in that - knowing the answer was there all along. I suppose there's probably a moral in all this. At least the post can serve as a warning to budding VBAers to check their arithmetic.

like image 41
Greedo Avatar answered Oct 21 '22 06:10

Greedo


This is something I have built, following your logic. It works quite ok, providing different results.

Option Explicit
Public Function TryMyRandom() As String

    Dim lngTotalChances         As Long
    Dim i                       As Long
    Dim previousValue           As Long
    Dim rnd                     As Long
    Dim result                  As Variant

    Dim varLngInputArray        As Variant
    Dim varLngInputChances      As Variant
    Dim varLngChancesReedit     As Variant

    varLngInputChances = Array(1, 2, 3, 4, 5)
    varLngInputArray = Array("a", "b", "c", "d", "e")
    lngTotalChances = Application.WorksheetFunction.Sum(varLngInputChances)
    rnd = Application.WorksheetFunction.RandBetween(1, lngTotalChances)

    ReDim varLngChancesReedit(UBound(varLngInputChances))

    For i = LBound(varLngInputChances) To UBound(varLngInputChances)
        varLngChancesReedit(i) = varLngInputChances(i) + previousValue
        previousValue = varLngChancesReedit(i)

        If rnd <= varLngChancesReedit(i) Then
            result = varLngInputArray(i)
            Exit For
        End If
    Next i

    TryMyRandom = result

End Function

Public Sub TestMe()

    Dim lng     As Long
    Dim i       As Long
    Dim dict    As Object
    Dim key     As Variant
    Dim res     As String

    Set dict = CreateObject("Scripting.Dictionary")

    For lng = 1 To 1000

        res = TryMyRandom
        If dict.Exists(res) Then
            dict(res) = dict(res) + 1
        Else
            dict(res) = 1
        End If


    Next lng

    For Each key In dict.Keys
        Debug.Print key & " ===> " & dict(key)
    Next


End Sub

Concerning your case, make sure that the array is sorted. E.g., in my case speaking about varLngInputChances. I have not taken a look at the corner cases, there can be an error there, possibly.

Run the TestMe sub. It will generate even a summary of the results. If you change the variations to varLngInputChances = Array(1, 1, 0, 0, 1), it gives:

a ===> 329 b ===> 351 e ===> 320

which is quite good random :) You can change the number of the sample here: For lng = 1 To 1000, it works quite fast. I have just tried it with 100,000 tests.

like image 36
Vityata Avatar answered Oct 21 '22 06:10

Vityata