Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance Considerations when using VBA Filter Function

I can't figure out how the Filter function works so fast. I have used Filter on all sorts of data and regardless of data-type, Filter obliterates any alternative method I employ. I regularly use the Binary search algorithm and the QuickArraySort algorithm written by Stephen Bullen (found in Professional Excel Development). The Binary Search is lightning fast (as fast as the Filter function, given that the array is sorted) and the Quick Sort algorithm is one of the fastest sorting algorithms known.

I have written some test code below comparing speeds of finding a random element in a very large array (size = 2,000,000). I intentionally populate the array in an un-ordered fashion (it should be noted that I have tried various un-ordered assigning methods, and the results are similar regardless of assigning method).

Sub SearchTest()

Dim i As Long, strMyArray() As String, lngSize As Long, strTest As String
Dim TimeBinarySearch As Long, TimeFilterSearch As Long
Dim lngResultBinary As Long, lngResultFilter As Long

Dim StartHour As Long, StartMinute As Long, StartSecond As Long
Dim StartMiliSecond As Long, StartTime As Long

Dim EndHour As Long, EndMinute  As Long, EndSecond As Long
Dim EndMiliSecond As Long, EndTime As Long

    lngSize = 2000000

    strTest = CStr(1735674 * 987)

    ReDim strMyArray(lngSize)

    For i = 1 To UBound(strMyArray)
        If i Mod 2 = 0 Then
            strMyArray(i) = CStr((i - 1) * 987)
        Else
            strMyArray(i) = CStr((i + 1) * 987)
        End If
    Next i

''Filter Test
'*******************************************************************
    StartHour = Hour(Now()) * 60 * 60 * 1000
    StartMinute = Minute(Now()) * 60 * 1000
    StartSecond = Second(Now()) * 1000
    StartMiliSecond = Format(Now(), "ms")

    StartTime = StartHour + StartMinute + StartSecond + StartMiliSecond

    lngResultFilter = CLng(Filter(strMyArray, strTest)(0))

    EndHour = Hour(Now()) * 60 * 60 * 1000
    EndMinute = Minute(Now()) * 60 * 1000
    EndSecond = Second(Now()) * 1000
    EndMiliSecond = Format(Now(), "ms")

    EndTime = EndHour + EndMinute + EndSecond + EndMiliSecond

    TimeFilterSearch = EndTime - StartTime
'*******************************************************************

''Binary Test
'*******************************************************************
    StartHour = Hour(Now()) * 60 * 60 * 1000
    StartMinute = Minute(Now()) * 60 * 1000
    StartSecond = Second(Now()) * 1000
    StartMiliSecond = Format(Now(), "ms")

    StartTime = StartHour + StartMinute + StartSecond + StartMiliSecond

    QuickSortString1D strMyArray

    lngResultBinary = strMyArray(CLng(BinarySearchString(strTest, strMyArray)))

    EndHour = Hour(Now()) * 60 * 60 * 1000
    EndMinute = Minute(Now()) * 60 * 1000
    EndSecond = Second(Now()) * 1000
    EndMiliSecond = Format(Now(), "ms")

    EndTime = EndHour + EndMinute + EndSecond + EndMiliSecond

    TimeBinarySearch = EndTime - StartTime
'*******************************************************************

    MsgBox lngResultFilter & vbCr & vbCr & lngResultBinary 

    MsgBox TimeFilterSearch & vbCr & vbCr & TimeBinarySearch

End Sub

Both methods return the same result, however the Filter method's return time is 0 ms and the QuickSort/BinarySearch method's return time is nearly 20 seconds. That is a huge difference!! As mentioned earlier, if the array is sorted the binary search method returns 0 ms as well (As most know, binary search requires that the array is sorted to begin with)

So, how can the Filter function look through 2,000,000 un-sorted entries and find the correct result immediately? It can't simply loop through every entry and compare it with the filtervalue (this is by far the slowest method), but based off of these preliminary test, it can't be utilizing a binary search either, as it would have to sort the array first. Even if there was an awesome sorting algorithm that was already compiled, I find it hard to believe that it could sort an array of size greater than a million instantaneously.

By the way, below is the QuickSort algorithm and the Binary Search algorithm.

    Sub QuickSortString1D(ByRef saArray() As String, _
                Optional ByVal bSortAscending As Boolean = True, _
                Optional ByVal lLow1 As Variant, _
                Optional ByVal lHigh1 As Variant)

    'Dimension variables
    Dim lLow2 As Long
    Dim lHigh2 As Long
    Dim sKey As String
    Dim sSwap As String

        On Error GoTo ErrorExit
        'If not provided, sort the entire array
        If IsMissing(lLow1) Then lLow1 = LBound(saArray)
        If IsMissing(lHigh1) Then lHigh1 = UBound(saArray)

        'Set new extremes to old extremes
        lLow2 = lLow1
        lHigh2 = lHigh1

        'Get value of array item in middle of new extremes
        sKey = saArray((lLow1 + lHigh1) \ 2)

        'Loop for all the items in the array between the extremes
        Do While lLow2 < lHigh2

            If bSortAscending Then
                'Find the first item that is greater than the mid-point item
                Do While saArray(lLow2) < sKey And lLow2 < lHigh1
                    lLow2 = lLow2 + 1
                Loop

                'Find the last item that is less than the mid-point item
                Do While saArray(lHigh2) > sKey And lHigh2 > lLow1
                    lHigh2 = lHigh2 - 1
                Loop
            Else
                'Find the first item that is less than the mid-point item
                Do While saArray(lLow2) > sKey And lLow2 < lHigh1
                    lLow2 = lLow2 + 1
                Loop

                'Find the last item that is greater than the mid-point item
                Do While saArray(lHigh2) < sKey And lHigh2 > lLow1
                    lHigh2 = lHigh2 - 1
                Loop

            End If

            'If the two items are in the wrong order, swap the rows
            If lLow2 < lHigh2 Then
                sSwap = saArray(lLow2)
                saArray(lLow2) = saArray(lHigh2)
                saArray(lHigh2) = sSwap
            End If

            'If the pointers are not together, advance to the next item
            If lLow2 <= lHigh2 Then
                lLow2 = lLow2 + 1
                lHigh2 = lHigh2 - 1
            End If
        Loop

        'Recurse to sort the lower half of the extremes
        If lHigh2 > lLow1 Then
            QuickSortString1D saArray, bSortAscending, lLow1, lHigh2
        End If

        'Recurse to sort the upper half of the extremes
        If lLow2 < lHigh1 Then
            QuickSortString1D saArray, bSortAscending, lLow2, lHigh1
        End If

    ErrorExit:

    End Sub

    '***********************************************************
    ' Comments: Uses a binary search algorithm to quickly locate
    ' a string within a sorted array of strings
    '
    ' Arguments: sLookFor The string to search for in the array
    ' saArray An array of strings, sorted ascending
    ' lMethod Either vbBinaryCompare or vbTextCompare
    ' Defaults to vbTextCompare
    ' lNotFound The value to return if the text isn’t
    ' found. Defaults to -1
    '
    ' Returns: Long The located position in the array,
    ' or lNotFound if not found
    '
    ' Date Developer Action
    ' ———————————————————————————————-
    ' 02 Jun 04 Stephen Bullen Created
    '
    Function BinarySearchString(ByRef sLookFor As String, _
                ByRef saArray() As String, _
                Optional ByVal lMethod As VbCompareMethod = vbTextCompare, _
                Optional ByVal lNotFound As Long = -1) As Long

    Dim lLow As Long
    Dim lMid As Long
    Dim lHigh As Long
    Dim lComp As Long

        On Error GoTo ErrorExit

        'Assume we didn’t find it
        BinarySearchString = lNotFound

        'Get the starting positions
        lLow = LBound(saArray)
        lHigh = UBound(saArray)

        Do
            'Find the midpoint of the array
            lMid = (lLow + lHigh) \ 2

            'Compare the mid-point element to the string being searched for
            lComp = StrComp(saArray(lMid), sLookFor, lMethod)

            If lComp = 0 Then
                'We found it, so return the location and quit
                BinarySearchString = lMid
                Exit Do
            ElseIf lComp = 1 Then
                'The midpoint item is bigger than us - throw away the top half
                lHigh = lMid - 1
            Else
                'The midpoint item is smaller than us - throw away the bottom half
                lLow = lMid + 1
            End If

            'Continue until our pointers cross
        Loop Until lLow > lHigh

    ErrorExit:

    End Function

Edit: It seems I should have done some "brute" force tests first. By simply looping through the array in a linear fashion as John Coleman suggests the Filter function performs, the return time for the same scenario above is 0 ms. See below:

Sub Test3()

Dim i As Long, strMyArray() As String, lngSize As Long, strTest As String
Dim lngResultBrute As Long, TimeBruteSearch As Long

    lngSize = 2000000
    strTest = CStr(936740 * 97)
    ReDim strMyArray(lngSize)

    For i = 1 To UBound(strMyArray)
        If i Mod 2 = 0 Then
            strMyArray(i) = CStr((i - 1) * 97)
        Else
            strMyArray(i) = CStr((i + 1) * 97)
        End If
    Next i

    StartTime = Timer

    ' Brute force search
    For i = 1 To UBound(strMyArray)
        If strMyArray(i) = strTest Then
            lngResultBrute = CLng(strTest)
            Exit For
        End If
    Next i

    EndTime = Timer

    TimeBruteSearch = EndTime - StartTime
    MsgBox TimeBruteSearch

End Sub
like image 244
Joseph Wood Avatar asked Sep 27 '22 09:09

Joseph Wood


2 Answers

I believe you are comparing apples and oranges here. It looks like when you test the Filter function you take an unordered array as input then use Filter to find matches to a test value. Intuition says that's O(N) = 2 million operations --- you test each array element once. Then you're done.

When you filter using the custom VBA functions, you first sort, which is pretty expensive O(N * Log2(N)) = 29 million. Once the array is sorted, you do get the benefit of searching an ordered array, which is O(Log2(N)) = 14. Even though you speeded up the search mightily, the penalty of sorting kills you.

Hope that helps.

like image 79
xidgel Avatar answered Sep 28 '22 23:09

xidgel


Filter does use a linear search -- it just executes it lightening quick because it is implemented in highly optimized C/C++ code. To see this, run the following code:

Function RandString(n As Long) As String
    'returns a random string in B-Z
    Dim i As Long
    Dim s As String
    For i = 1 To n
        s = s & Chr(66 + Int(25 * Rnd()))
    Next i
    RandString = s
End Function

Sub test()
    Dim times(1 To 20) As Double
    Dim i As Long, n As Long
    Dim A() As String
    Dim start As Double
    Dim s As String
    Randomize
    s = RandString(99)
    ReDim A(1 To 2000000)
    For i = 1 To 2000000
        A(i) = s + RandString(1)
    Next i
    s = s & "A"
    For i = 20 To 1 Step -1
        n = i * 100000
        ReDim Preserve A(1 To n)
        start = Timer
        Debug.Print UBound(Filter(A, s)) 'should be -1
        times(i) = Timer - start
    Next i
    For i = 1 To 20
        Cells(i, 1) = i
        Cells(i, 2) = times(i)
    Next i
End Sub

This code creates an array of 2,000,000 random strings of length 100, each of which differs from the target string in the last position. Then it feeds subarrays whose sizes are multiples of 100,000 into Filter, timing the time it takes. The output looks like this:

enter image description here

The clear linear trend doesn't exactly prove but is strong evidence that VBA's Filter is executing a straightforward linear search.

like image 37
John Coleman Avatar answered Sep 28 '22 21:09

John Coleman