I can't figure out how the Filter function works so fast. I have used Filter on all sorts of data and regardless of data-type, Filter obliterates any alternative method I employ. I regularly use the Binary search algorithm and the QuickArraySort algorithm written by Stephen Bullen (found in Professional Excel Development). The Binary Search is lightning fast (as fast as the Filter function, given that the array is sorted) and the Quick Sort algorithm is one of the fastest sorting algorithms known.
I have written some test code below comparing speeds of finding a random element in a very large array (size = 2,000,000). I intentionally populate the array in an un-ordered fashion (it should be noted that I have tried various un-ordered assigning methods, and the results are similar regardless of assigning method).
Sub SearchTest()
Dim i As Long, strMyArray() As String, lngSize As Long, strTest As String
Dim TimeBinarySearch As Long, TimeFilterSearch As Long
Dim lngResultBinary As Long, lngResultFilter As Long
Dim StartHour As Long, StartMinute As Long, StartSecond As Long
Dim StartMiliSecond As Long, StartTime As Long
Dim EndHour As Long, EndMinute As Long, EndSecond As Long
Dim EndMiliSecond As Long, EndTime As Long
lngSize = 2000000
strTest = CStr(1735674 * 987)
ReDim strMyArray(lngSize)
For i = 1 To UBound(strMyArray)
If i Mod 2 = 0 Then
strMyArray(i) = CStr((i - 1) * 987)
Else
strMyArray(i) = CStr((i + 1) * 987)
End If
Next i
''Filter Test
'*******************************************************************
StartHour = Hour(Now()) * 60 * 60 * 1000
StartMinute = Minute(Now()) * 60 * 1000
StartSecond = Second(Now()) * 1000
StartMiliSecond = Format(Now(), "ms")
StartTime = StartHour + StartMinute + StartSecond + StartMiliSecond
lngResultFilter = CLng(Filter(strMyArray, strTest)(0))
EndHour = Hour(Now()) * 60 * 60 * 1000
EndMinute = Minute(Now()) * 60 * 1000
EndSecond = Second(Now()) * 1000
EndMiliSecond = Format(Now(), "ms")
EndTime = EndHour + EndMinute + EndSecond + EndMiliSecond
TimeFilterSearch = EndTime - StartTime
'*******************************************************************
''Binary Test
'*******************************************************************
StartHour = Hour(Now()) * 60 * 60 * 1000
StartMinute = Minute(Now()) * 60 * 1000
StartSecond = Second(Now()) * 1000
StartMiliSecond = Format(Now(), "ms")
StartTime = StartHour + StartMinute + StartSecond + StartMiliSecond
QuickSortString1D strMyArray
lngResultBinary = strMyArray(CLng(BinarySearchString(strTest, strMyArray)))
EndHour = Hour(Now()) * 60 * 60 * 1000
EndMinute = Minute(Now()) * 60 * 1000
EndSecond = Second(Now()) * 1000
EndMiliSecond = Format(Now(), "ms")
EndTime = EndHour + EndMinute + EndSecond + EndMiliSecond
TimeBinarySearch = EndTime - StartTime
'*******************************************************************
MsgBox lngResultFilter & vbCr & vbCr & lngResultBinary
MsgBox TimeFilterSearch & vbCr & vbCr & TimeBinarySearch
End Sub
Both methods return the same result, however the Filter method's return time is 0 ms and the QuickSort/BinarySearch method's return time is nearly 20 seconds. That is a huge difference!! As mentioned earlier, if the array is sorted the binary search method returns 0 ms as well (As most know, binary search requires that the array is sorted to begin with)
So, how can the Filter function look through 2,000,000 un-sorted entries and find the correct result immediately? It can't simply loop through every entry and compare it with the filtervalue (this is by far the slowest method), but based off of these preliminary test, it can't be utilizing a binary search either, as it would have to sort the array first. Even if there was an awesome sorting algorithm that was already compiled, I find it hard to believe that it could sort an array of size greater than a million instantaneously.
By the way, below is the QuickSort algorithm and the Binary Search algorithm.
Sub QuickSortString1D(ByRef saArray() As String, _
Optional ByVal bSortAscending As Boolean = True, _
Optional ByVal lLow1 As Variant, _
Optional ByVal lHigh1 As Variant)
'Dimension variables
Dim lLow2 As Long
Dim lHigh2 As Long
Dim sKey As String
Dim sSwap As String
On Error GoTo ErrorExit
'If not provided, sort the entire array
If IsMissing(lLow1) Then lLow1 = LBound(saArray)
If IsMissing(lHigh1) Then lHigh1 = UBound(saArray)
'Set new extremes to old extremes
lLow2 = lLow1
lHigh2 = lHigh1
'Get value of array item in middle of new extremes
sKey = saArray((lLow1 + lHigh1) \ 2)
'Loop for all the items in the array between the extremes
Do While lLow2 < lHigh2
If bSortAscending Then
'Find the first item that is greater than the mid-point item
Do While saArray(lLow2) < sKey And lLow2 < lHigh1
lLow2 = lLow2 + 1
Loop
'Find the last item that is less than the mid-point item
Do While saArray(lHigh2) > sKey And lHigh2 > lLow1
lHigh2 = lHigh2 - 1
Loop
Else
'Find the first item that is less than the mid-point item
Do While saArray(lLow2) > sKey And lLow2 < lHigh1
lLow2 = lLow2 + 1
Loop
'Find the last item that is greater than the mid-point item
Do While saArray(lHigh2) < sKey And lHigh2 > lLow1
lHigh2 = lHigh2 - 1
Loop
End If
'If the two items are in the wrong order, swap the rows
If lLow2 < lHigh2 Then
sSwap = saArray(lLow2)
saArray(lLow2) = saArray(lHigh2)
saArray(lHigh2) = sSwap
End If
'If the pointers are not together, advance to the next item
If lLow2 <= lHigh2 Then
lLow2 = lLow2 + 1
lHigh2 = lHigh2 - 1
End If
Loop
'Recurse to sort the lower half of the extremes
If lHigh2 > lLow1 Then
QuickSortString1D saArray, bSortAscending, lLow1, lHigh2
End If
'Recurse to sort the upper half of the extremes
If lLow2 < lHigh1 Then
QuickSortString1D saArray, bSortAscending, lLow2, lHigh1
End If
ErrorExit:
End Sub
'***********************************************************
' Comments: Uses a binary search algorithm to quickly locate
' a string within a sorted array of strings
'
' Arguments: sLookFor The string to search for in the array
' saArray An array of strings, sorted ascending
' lMethod Either vbBinaryCompare or vbTextCompare
' Defaults to vbTextCompare
' lNotFound The value to return if the text isn’t
' found. Defaults to -1
'
' Returns: Long The located position in the array,
' or lNotFound if not found
'
' Date Developer Action
' ———————————————————————————————-
' 02 Jun 04 Stephen Bullen Created
'
Function BinarySearchString(ByRef sLookFor As String, _
ByRef saArray() As String, _
Optional ByVal lMethod As VbCompareMethod = vbTextCompare, _
Optional ByVal lNotFound As Long = -1) As Long
Dim lLow As Long
Dim lMid As Long
Dim lHigh As Long
Dim lComp As Long
On Error GoTo ErrorExit
'Assume we didn’t find it
BinarySearchString = lNotFound
'Get the starting positions
lLow = LBound(saArray)
lHigh = UBound(saArray)
Do
'Find the midpoint of the array
lMid = (lLow + lHigh) \ 2
'Compare the mid-point element to the string being searched for
lComp = StrComp(saArray(lMid), sLookFor, lMethod)
If lComp = 0 Then
'We found it, so return the location and quit
BinarySearchString = lMid
Exit Do
ElseIf lComp = 1 Then
'The midpoint item is bigger than us - throw away the top half
lHigh = lMid - 1
Else
'The midpoint item is smaller than us - throw away the bottom half
lLow = lMid + 1
End If
'Continue until our pointers cross
Loop Until lLow > lHigh
ErrorExit:
End Function
Edit: It seems I should have done some "brute" force tests first. By simply looping through the array in a linear fashion as John Coleman suggests the Filter function performs, the return time for the same scenario above is 0 ms. See below:
Sub Test3()
Dim i As Long, strMyArray() As String, lngSize As Long, strTest As String
Dim lngResultBrute As Long, TimeBruteSearch As Long
lngSize = 2000000
strTest = CStr(936740 * 97)
ReDim strMyArray(lngSize)
For i = 1 To UBound(strMyArray)
If i Mod 2 = 0 Then
strMyArray(i) = CStr((i - 1) * 97)
Else
strMyArray(i) = CStr((i + 1) * 97)
End If
Next i
StartTime = Timer
' Brute force search
For i = 1 To UBound(strMyArray)
If strMyArray(i) = strTest Then
lngResultBrute = CLng(strTest)
Exit For
End If
Next i
EndTime = Timer
TimeBruteSearch = EndTime - StartTime
MsgBox TimeBruteSearch
End Sub
I believe you are comparing apples and oranges here. It looks like when you test the Filter
function you take an unordered array as input then use Filter
to find matches to a test value. Intuition says that's O(N) = 2 million operations --- you test each array element once. Then you're done.
When you filter using the custom VBA functions, you first sort, which is pretty expensive O(N * Log2(N)) = 29 million. Once the array is sorted, you do get the benefit of searching an ordered array, which is O(Log2(N)) = 14. Even though you speeded up the search mightily, the penalty of sorting kills you.
Hope that helps.
Filter
does use a linear search -- it just executes it lightening quick because it is implemented in highly optimized C/C++ code. To see this, run the following code:
Function RandString(n As Long) As String
'returns a random string in B-Z
Dim i As Long
Dim s As String
For i = 1 To n
s = s & Chr(66 + Int(25 * Rnd()))
Next i
RandString = s
End Function
Sub test()
Dim times(1 To 20) As Double
Dim i As Long, n As Long
Dim A() As String
Dim start As Double
Dim s As String
Randomize
s = RandString(99)
ReDim A(1 To 2000000)
For i = 1 To 2000000
A(i) = s + RandString(1)
Next i
s = s & "A"
For i = 20 To 1 Step -1
n = i * 100000
ReDim Preserve A(1 To n)
start = Timer
Debug.Print UBound(Filter(A, s)) 'should be -1
times(i) = Timer - start
Next i
For i = 1 To 20
Cells(i, 1) = i
Cells(i, 2) = times(i)
Next i
End Sub
This code creates an array of 2,000,000 random strings of length 100, each of which differs from the target string in the last position. Then it feeds subarrays whose sizes are multiples of 100,000 into Filter
, timing the time it takes. The output looks like this:
The clear linear trend doesn't exactly prove but is strong evidence that VBA's Filter
is executing a straightforward linear search.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With