Final final results:
I was wondering whether the results below changed if the string was longer. I ran exactly the same tests on the same computer, except each cell had a random string of 34 characters rather than four. These were the results:
Comintern (Regexp): 136.1 ms
brettdj (Regexp): 139.9 ms
Slai (Regexp): 158.4 ms
*Original Regex: 161.0 ms*
Comintern (AN): 170.1 ms
Comintern (Hash): 183.6 ms
ThunderFrame: 232.9 ms
*Original replace: 372.9 ms*
*Original InStr: 478.1 ms*
CallumDA33: 1218.1 ms
This really shows the speed of the Regex - all the solutions utilising Regex.replace are significantly faster, with the best being Comintern's implementation.
In summary, if the strings are long, use arrays, if they are short, use the clipboard. If unsure, the optimal result is to use arrays, but this may sacrifice a little performance on short strings.
Final results:
Thanks very much for all of your suggestions, clearly I still have a lot to learn. I was thinking about this all yesterday, so I decided to rerun everything at home. Here are the final results, based on applying each of these to 30,000 four character strings.
My computer at home is an Intel i7 @ 3.6 GHz, 8GB RAM, 64-bit Windows 10 and Excel 2016. Similar conditions to before in that I have processes running in the background, but I'm not actively doing anything throughout the tests.
Original replace: 97.67 ms
Original InStr: 106.54 ms
Original Regex: 113.46 ms
ThunderFrame: 82.21 ms
Comintern (AN): 96.98 ms
Comintern (OR): 81.87 ms
Comintern (Hash): 101.18 ms
brettdj: 81.66 ms
CallumDA33: 201.64 ms
Slai: 68.38 ms
I've therefore accepted Slai's answer as it is clearly the fastest for general implementation, but I'll rerun them all at work against the actual data to check this still works.
Original post:
I have an array in Excel that is a list of part numbers. I need to turn every member of the array alphanumeric, for example
ABC123-001 -> ABC123001
ABC123/001 -> ABC123001
ABC123001 -> ABC123001
What is the fastest way of doing this?
For context, our part numbers can come in different forms, so I'm writing a function that finds the best match within a given range. At the moment, the part of the function that makes everything alphanumeric takes about 50ms to run, whereas the rest of the function takes around 30ms in total. I also can't avoid using Excel.
I've done some work myself (see answer below), but the main issue is that I have to loop through every element of the array one-by-one - could there be a better way? I've also never run tests before, so any feedback on improving them would be much appreciated.
Here is what I've tried so far.
I'm using MicroTimer and my computer has an Intel i5 @2.5GHz, 4GB of RAM, 64-bit Windows 7. I've got processes running in the background, but I'm not actively doing anything else whilst these are run.
I created 30,000 lines of random symbols using this code:
=CHAR(RANDBETWEEN(1,60))&CHAR(RANDBETWEEN(48,57))&CHAR(RANDBETWEEN(37,140))&CHAR(RANDBETWEEN(37,140))
(note how we stop the first character at 60 because '=' is char(61)
and we want to avoid Excel interpreting this as a formula. Also we force the second character to be a number so we can guarantee at least one alphanumeric character in there.)
1. Using a loop based on cases. Average time: 175ms
Using the function in this post, we load the range into an array, apply the function to each element of the array and paste it back. Code:
Function AlphaNumericOnly(strSource As Variant) As String
Dim i As Integer
Dim strResult As String
For i = 1 To Len(strSource)
Select Case Asc(Mid(strSource, i, 1))
Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space
strResult = strResult & Mid(strSource, i, 1)
End Select
Next
AlphaNumericOnly = strResult
End Function
Sub Replace()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("Replace")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim arr As Variant
arr = inputRng
Dim i As Integer
For i = LBound(arr) To UBound(arr)
arr(i, 1) = AlphaNumericOnly(arr(i, 1))
Next i
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
2. Using InStr() to check each character. Average time: 201ms
Define a string of valid values. Check one-by-one if the valid values appear in the array elements:
Sub InStr()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("InStr")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim arr As Variant
arr = inputRng
Dim validValues As String
validValues = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 'put numbers and capitals at the start as they are more likely'
Dim i As Integer, j As Integer
Dim result As String
For i = LBound(arr) To UBound(arr)
result = vbNullString
For j = 1 To Len(arr(i, 1))
If InStr(validValues, Mid(arr(i, 1), j, 1)) <> 0 Then
result = result & Mid(arr(i, 1), j, 1)
End If
Next j
arr(i, 1) = result
Next i
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
3. Using regex.Replace on the array. Time: 171ms
Define a regex and use this to replace each element of the array.
Sub Regex()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("Regex")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim arr As Variant
arr = inputRng
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
.Global = True
.ignorecase = True
.Pattern = "[^\w]"
End With
Dim i As Integer
For i = LBound(arr) To UBound(arr)
arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString)
Next i
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
Edit:
@ThunderFrame - our part numbers generally come in the following formats:
I have thought about using regex.test on each string before launching into the replacement, but I'm not sure if this will just copy the string to then test it, in which case I may as well just make the replacement to start with.
@Slai - thanks for the link - I will look into that in more detail
It is not true that Regex has to be the winner. My second solution below is faster than even early-bound Regex! And my first solution is as fast as late-bound Regex. BOTH ARE NATIVE VBA ONLY.
Interesting question. The Original InStr method should be much faster than the results shown in the OP's question.
Its poor performance is due to string concatenation, which VBA is not good at. The longer the strings the worse it gets.
My version of the InStr method below does not use concatenation at all. It is many times faster than the original. In fact, its speed of execution matches late-bound Regex. This InStr version is completely native to VBA and very, very fast. And the longer the source strings, the faster it gets, relative to concatenation.
This method also gains a few ticks of performance by utilizing the ($) version of string functions instead of the variant version. InStrB
is slightly faster than InStr
. And using temporary string variables t
and arx
saves a good chunk of time as well.
Sub InStr_ExcelHero()
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("InStr")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim i&, j&, p&, max&, arx$, t$, res$, arr
arr = inputRng
max = Len(arr(1, 1))
Dim validVals$: validVals = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
For i = LBound(arr) To UBound(arr)
p = 0
arx = arr(i, 1)
res = Space$(max)
For j = 1 To max
t = Mid$(arx, j, 1)
If InStrB(validVals, t) Then
p = p + 1
Mid$(res, p, 1) = t
End If
Next
arr(i, 1) = Left$(res, p)
Next
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
And the ArrayLookup version below is more than twice as fast as InStr_ExcelHero().
In fact, the method below is actually faster than early-bound Regex!
This is native VBA. No dependencies. Faster than Regex. The following method is likely the quickest way to turn every element of an array to alphanumeric... when directed from VBA... other than a custom c++ dll:
Sub ArrayLookup_ExcelHero()
Const VALS$ = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
Dim inputSh As Worksheet
Dim inputRng As Range
Set inputSh = Sheets("Data")
Set inputRng = inputSh.Range("A1:A30000")
Dim outputSh As Worksheet
Dim outputRng As Range
Set outputSh = Sheets("InStr")
Set outputRng = outputSh.Range("A1:A30000")
Dim time1 As Double, time2 As Double
time1 = MicroTimer
Dim i&, j&, p&, max&, t&, arx() As Byte, res() As Byte, arr
arr = inputRng
max = Len(arr(1, 1))
Dim Keep&(0 To 255)
For i = 1 To Len(VALS)
Keep(Asc(Mid$(VALS, i, 1))) = 1
Next
For i = LBound(arr) To UBound(arr)
p = 0
ReDim res(0 To max)
arx = StrConv(arr(i, 1), vbFromUnicode)
For j = 0 To max - 1
t = arx(j)
If Keep(t) Then
res(p) = t
p = p + 1
End If
Next
arr(i, 1) = StrConv(res, vbUnicode)
Next
outputRng = arr
time2 = MicroTimer
Debug.Print (time2 - time1) * 1000
End Sub
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With