Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up VBA using Array with Named Range

Tags:

arrays

excel

vba

I have a spreadsheet that I've been working on for over a month to sort and optimize coordinates (sometimes exceeding 100,000 rows) and it is UNBEARABLY slow once I start importing files over 5,000 rows (it has taken several hours to complete the calculations and sorting process on data sets over 25,000 rows). The processing time grows exponentially with the number of coordinates imported. I've researched Stack Overflow to help me with some of the code and included a few safety nets for error handling and exiting the sub if there is no data.

The bit of code I'm using to actually sort the coordinates to find nearest neighbor coordinates and that I need help with is under the remark ' Sort coordinates in Point List Data looking for shortest distance between points, located at approximately line 58 of 109 in my code below.

Simple quadrant coordinates (X, Y and Z) are in columns H, I , and J respectively, starting at row 6. The named range is PosXYZ and the formula for this named range is:

=INDEX(Optimizer!$H:$H, ROW(Optimizer!$H$5) + 1):INDEX(Optimizer!$L:$L, MATCH(bignum, Optimizer!$I:$I)). 

bignum is defined as =1E+307*17.9769313486231.

  • Column K is populated with the Pythagorean Theorem to calculate the distance between the current data point X,Y and the previous data point X,Y in the list.

  • Column L is populated with a list of sequential row numbers created when the data is imported so that the original sort order of the data can be restored using a separate piece of VBA code.

I'm trying to see if using an array would greatly speed up the time it takes to run this point list optimizer, and I'm hoping that someone might be able to help me figure out how to get this portion of my code to run exponentially faster.


I found the following similar question, and I'm wondering if this approach is something that I could use to help speed up my processing time: How do you speed up VBA code with a named range?

I've learned a lot from this site, and I'm hoping that someone has the patience and the knowledge to help me figure this one out. I don't have a lot of experience using arrays in VBA.

Sample Excel file with 2904 data points and VBA code can be found here.

Sub Optimize_PL()

   ' Add an error handler
    On Error GoTo ErrorHandler

    ' Speed up sub-routine by turning off screen updating and auto calculating until the end of the sub-routine
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual

    ' Declare variable names and types
    Dim rInp    As Range
    Dim rTmp    As Range
    Dim i       As Long
    Dim n       As Long
    Dim sFrm    As String
    Dim PosX    As String
    Dim PosY    As String
    Dim PosZ    As String
    Dim SortOrder As String
    Dim LastRow As Long
    Dim hLastRow As Long
    Dim lLastRow As Long

    ' Find number of populated cells in Column H and Column L (not including the 5 column header rows)
    hLastRow = Cells(Rows.Count, "H").End(xlUp).Row - 5
    lLastRow = Cells(Rows.Count, "L").End(xlUp).Row - 5

    ' Check for existing Point List Data to avoid error
    If hLastRow < 2 Then
        MsgBox "Not enough data points are available to optimize." & vbNewLine & _
               "" & vbNewLine & _
               "Column H populated rows: " & hLastRow, vbInformation, "Error Message"
        GoTo ErrorHandler

    ElseIf lLastRow < 2 Then
        MsgBox "Original sort order row numbers not available in Column L," & vbNewLine & _
               "" & vbNewLine & _
               "Original sort order canot be restored without Row # data." & vbNewLine & _
               "Column L populated rows: " & lLastRow, vbInformation, "Error Message"
        Err.Number = 0
        GoTo ErrorHandler

    ElseIf hLastRow <> lLastRow Then
        MsgBox "The number of rows of coordinate data does not match the" & vbNewLine & _
               "number of rows in the Row # column. There is no way to" & vbNewLine & _
               "restore the original sort order." & vbNewLine & _
               "" & vbNewLine & _
               "Column H populated rows: " & hLastRow & vbNewLine & _
               "Column L populated rows: " & lLastRow, vbInformation, "Error Message"
        Err.Number = 0
        GoTo ErrorHandler

    End If

    ' Timer Start (calculate the length of time this VBA code takes to complete)
    StartTime = Timer

    ' Sort coordinates in Point List Data looking for shortest distance between points
    Set rInp = Range("PosXYZ").Resize(, 4)
    n = rInp.Rows.Count
    i = 0

    For i = 1 To n - 1
        Application.StatusBar = i + 1 & " of " & n & "    Calculating for " & SecondsElapsed & " seconds" & "        Estimated Time Remaining:  " & TimeRemaining & "  seconds"
        SecondsElapsed = Round(Timer - StartTime) ' Change to StartTime, 2) to display seconds two decimal places out
        TimeRemaining = Round((SecondsElapsed / (i + 1)) * (n - (i + 1))) ' Change to i + 1)),2) to display seconds two decimal places out

        Set rTmp = rInp.Offset(i).Resize(n - i, 5)

        With rTmp
            PosX = .Cells(0, 1).Address(ReferenceStyle:=xlR1C1)
            PosY = .Cells(0, 2).Address(ReferenceStyle:=xlR1C1)
            PosZ = .Cells(0, 3).Address(ReferenceStyle:=xlR1C1)
            SortOrder = .Cells(0, 5).Address(ReferenceStyle:=xlR1C1)

        sFrm = Replace(Replace(Replace(Replace("=SQRT((RC[-3] - PosX)^2 + (RC[-2] - PosY)^2)", "PosX", PosX), "PosY", PosY), "PosZ", PosZ), "SortOrder", SortOrder)
        sFrm = Replace(Replace(Replace(Replace(sFrm, "PosX", PosX), "PosY", PosY), "PosZ", PosZ), "SortOrder", SortOrder)
        .Columns(4).FormulaR1C1 = sFrm
        .Sort Key1:=.Range("D1"), Header:=xlNo

        End With

    Next i

    ' Timer Stop (calculate the length of time this VBA code took to complete)
    SecondsElapsed = Round(Timer - StartTime, 2)

    ' Turn screen updating and auto calculating back on since file processing is now complete
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic

    ' Message to report VBA code processing time after file selection and number of data rows imported
    MsgBox "Calculated optimized travel path between coordinates in " & vbNewLine & _
           "" & vbNewLine & _
           "            " & SecondsElapsed & " seconds"

' Reset to defaults in the event of a processing error during the sub-routine execution
ErrorHandler:
    Application.StatusBar = True
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    If Err.Number <> 0 Then

    ' Display a message to the user including the error code in the event of an error during execution
    MsgBox "An error number " & Err.Number & " was encountered!" & vbNewLine & _
           "Part or all of this VBA code was not completed.", vbInformation, "Error Message"
    End If

End Sub
like image 992
JRN0504 Avatar asked Mar 19 '18 20:03

JRN0504


2 Answers

Yes you can speed up this code a lot using arrays: the code below is approx 20 times faster.

Sub Optimize_PL2()

' Add an error handler
    On Error GoTo ErrorHandler

    ' Speed up sub-routine by turning off screen updating and auto calculating until the end of the sub-routine
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual

    ' Define variable names and types
    Dim i As Long
    Dim j As Long
    Dim k As Long
    Dim hLastRow As Long
    Dim lLastRow As Long

    Dim varData As Variant
    Dim dData() As Double
    Dim dResult() As Double
    Dim jRow() As Long
    Dim dThisDist As Double
    Dim dSmallDist As Double
    Dim jSmallRow As Long

    ' Find number of populated cells in Column H and Column L (not including the 5 column header rows)
    hLastRow = Cells(Rows.Count, "H").End(xlUp).Row - 5
    lLastRow = Cells(Rows.Count, "L").End(xlUp).Row - 5

    ' Check for existing Point List Data to avoid error
    If hLastRow < 2 Then
        MsgBox "Not enough data points are available to optimize." & vbNewLine & _
               "" & vbNewLine & _
               "Column H populated rows: " & hLastRow, vbInformation, "Error Message"
        GoTo ErrorHandler

    ElseIf lLastRow < 2 Then
        MsgBox "Original sort order row numbers not available in Column L," & vbNewLine & _
               "" & vbNewLine & _
               "Original sort order canot be restored without Row # data." & vbNewLine & _
               "Column L populated rows: " & lLastRow, vbInformation, "Error Message"
        Err.Number = 0
        GoTo ErrorHandler

    ElseIf hLastRow <> lLastRow Then
        MsgBox "The number of rows of coordinate data does not match the" & vbNewLine & _
               "number of rows in the Row # column. There is no way to" & vbNewLine & _
               "restore the original sort order." & vbNewLine & _
               "" & vbNewLine & _
               "Column H populated rows: " & hLastRow & vbNewLine & _
               "Column L populated rows: " & lLastRow, vbInformation, "Error Message"
        Err.Number = 0
        GoTo ErrorHandler

    End If

    On Error GoTo 0
    ' Timer Start (calculate the length of time this VBA code takes to complete)
    StartTime = Timer

    varData = Worksheets("Optimizer").Range("H6").Resize(hLastRow, 5).Value2
    ReDim dResult(1 To hLastRow, 1 To 5) As Double
    ReDim dData(1 To hLastRow, 1 To 5) As Double
    '
    ' copy vardata into data coercing to double
    ' (repeated arithmetic is faster on doubles than variants)
    '
    For j = LBound(varData) To UBound(varData)
        For k = LBound(varData, 2) To UBound(varData, 2)
            dData(j, k) = CDbl(varData(j, k))
            If j = 1 Then
                dResult(j, k) = dData(j, k)
            End If
        Next k
    Next j
    '
    ' look for shortest distance row
    '
    For i = LBound(dResult) To UBound(dResult) - 1
        '
        ' calc distance from this row to all remaining rows and find shortest
        '
        jSmallRow = -1
        dSmallDist = 1 * 10 ^ 307
        For j = 2 To UBound(dData)
            If dData(j, 3) > -1 And j <> i Then
                dThisDist = Sqr((dResult(i, 1) - dData(j, 1)) ^ 2 + (dResult(i, 2) - dData(j, 2)) ^ 2)
                If dThisDist < dSmallDist Then
                    jSmallRow = j
                    dSmallDist = dThisDist
                End If
            End If
        Next j
        '
        ' copy jsmallrow row to i+1
        '
        If jSmallRow > -1 Then
            For k = 1 To 2
                dResult(i + 1, k) = dData(jSmallRow, k)
            Next k
            dResult(i + 1, 4) = dSmallDist
            dResult(i + 1, 5) = jSmallRow
            '
            ' set smallrow so it does not get used again
            '
            dData(jSmallRow, 3) = -1
        End If
    Next i
    '
    ' put data back on sheet
    '
    Worksheets("Optimizer").Range("H6").Resize(hLastRow, 5).Value2 = dResult



    ' Timer Stop (calculate the length of time this VBA code took to complete)
    SecondsElapsed = Round(Timer - StartTime, 2)

    ' Turn screen updating and auto calculating back on since file processing is now complete
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic

    ' Message to report VBA code processing time after file selection and number of data rows imported
    MsgBox "Calculated optimized travel path between coordinates in " & vbNewLine & _
           "" & vbNewLine & _
         "            " & SecondsElapsed & " seconds"

    ' Reset to defaults in the event of a processing error during the sub-routine execution
ErrorHandler:
    Application.StatusBar = True
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    If Err.Number <> 0 Then

        ' Display a message to the user including the error code in the event of an error during execution
        MsgBox "An error number " & Err.Number & " was encountered!" & vbNewLine & _
               "Part or all of this VBA code was not completed.", vbInformation, "Error Message"
    End If

End Sub
like image 170
Charles Williams Avatar answered Oct 04 '22 12:10

Charles Williams


Aside from switching to a different algorihm (e.g. the k-d tree), here are a few things that will speed up the code:

  1. Convert formulas to values before sorting
  2. Only update status bar periodically (e.g. every 100 loops)
  3. Delete the dynamic named-range "PosXYZ" and use hLastRow that was already calculated. Dynamic named ranges are re-calculated when the sheet is calculated and thus can be costly.

Updated code:

Const HeaderRow = 5
Set rInp = Range(Cells(HeaderRow + 1, 8), Cells(hLastRow, 11))
n = rInp.Rows.Count
For i = 1 To n - 1
    If i Mod 100 = 0 Then
        Application.StatusBar = i + 1 & " of " & n & "    Calculating for " & SecondsElapsed & " seconds" & "        Estimated Time Remaining:  " & TimeRemaining & "  seconds"
        SecondsElapsed = Round(Timer - StartTime) ' Change to StartTime, 2) to display seconds two decimal places out
        TimeRemaining = Round((SecondsElapsed / (i + 1)) * (n - (i + 1))) ' Change to i + 1)),2) to display seconds two decimal places out
    End If
    Set rTmp = rInp.Offset(i).Resize(n - i, 5)
    With rTmp
        Dim TargetRow As Long
        TargetRow = HeaderRow + i
        sFrm = "=SQRT((RC[-3] - R" & TargetRow & "C[-3])^2 + (RC[-2] - R" & TargetRow & "C[-2])^2)"
        With .Columns(4)
            .FormulaR1C1 = sFrm
            .Calculate
            .Value = .Value
        End With
        .Sort Key1:=.Range("D1"), Header:=xlNo
    End With
Next i
like image 40
Rachel Hettinger Avatar answered Oct 04 '22 12:10

Rachel Hettinger