Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Closest distance between lat/longs in large dataset in excel vba

Tags:

Beginner looper here...I am working on this well spacing project that looks at lat/longs and determines the next closest well. I think I may be creating an infinite loop or the program is just taking forever to run (It's looping through 15,000 rows). My main struggle has been trying to make sure each location is compared to every location in the dataset. From there I take the 2nd lowest distance (since the lowest will be zero when it compares to itself).

Sub WellSpacing()
Dim r As Integer, c As Integer, L As Integer, lastrow As Integer
Dim lat1 As Double, lat2 As Double, long1 As Double, long2 As Double
Dim distance As Double, d1 As Double, d2 As Double, d3 As Double
Dim PI As Double

PI = Application.WorksheetFunction.PI()
L = 2
r = 3
c = 10
lastrow = Sheets("Test").Cells(Rows.Count, "J").End(xlUp).Row

For L = 2 To lastrow
    For r = 2 To lastrow
        lat1 = Sheets("Test").Cells(L, c)
        long1 = Sheets("Test").Cells(L, c + 1)
        lat2 = Sheets("Test").Cells(r, c)
        long2 = Sheets("Test").Cells(r, c + 1)
        d1 = Sin((Abs((lat2 - lat1)) * PI / 180 / 2)) ^ 2 + Cos(lat1 * PI / 180) * Cos(lat2 * PI / 180) * Sin(Abs(long2 - long1) * PI / 180 / 2) ^ 2
        d2 = 2 * Application.WorksheetFunction.Atan2(Sqr(1 - d1), Sqr(d1))
        d3 = 6371 * d2 * 3280.84
        Sheets("Working").Cells(r - 1, c - 9) = d3
    Next r

    Sheet2.Activate
    Range("A:A").Sort Key1:=Range("A1"), Order1:=xlAscending
    distance = Sheet2.Range("A2")
    Sheets("Test").Cells(L, c + 2) = distance
    Sheet2.Range("A:A").Clear
    Sheet1.Activate

Next L
End Sub
like image 594
kdreed15 Avatar asked Dec 25 '17 15:12

kdreed15


People also ask

How do I find the nearest location using latitude and longitude in Excel?

To get the latitude of the address in cell B2, use the formula = GetLatitude(B2) To get the longitude of the address in cell B2, use the formula = GetLongitude(B2) To get both the latitude and longitude of the address in cell B2, use the formula = GetCoordinates(B2)

How do I calculate distance between GPS coordinates in Excel?

Here are the formulas for degree coordinates: Cell B5: =distvincenty(B2,C2,B3,C3) Cell D5: =MOD(DEGREES(ATAN2(COS(B2*PI()/180) *SIN(B3*PI()/180)-SIN(B2*PI()/180) *COS(B3*PI()/180) *COS(C3*PI()/180-C2*PI()/180), SIN(C3*PI()/180-C2*PI()/180) *COS(B2*PI()/180)))+360,360)

How do you find the distance between two Geocodes?

The distance difference in the longitude direction is EarthRadius * longitude difference * cos(latitude) . We multiply by cos(lat) because the longitude degrees don't make the same km distance if the latitude changes.


2 Answers

I've been working with geo-location math (aka, coordinate geometry) a lot lately and wrote a sub to do pretty much the same thing you're seeking.

Your code probably isn't creating an infinite loop, but calculating distances between thousands of coordinates can be very processor-intensive and even minor changes to your code can have a huge impact on processing time.


Calculating closest coordinate pair: Brute Force Method

There are a number of algorithms for determining closest points however the easiest to code (therefore possibly best for one-time use) is known as the Brute Force Method.

For p1 = 1 to numPoints
    For p2 = p1 + 1 to numPoints
        ...calculate {distance}
        ...if {distance} < minDistance then minDist = {distance}
    Next p2
Next p1

Using this method, distance will be calculated between x * ( n - 1 ) / 2 pairs of points.

For example, a list of 5 points would require 10 comparisons:

  1. Point 1Point 2
  2. Point 1Point 3
  3. Point 1Point 4
  4. Point 1Point 5
  5. Point 2Point 3
  6. Point 2Point 4
  7. Point 2Point 5
  8. Point 3Point 4
  9. Point 3Point 5
  10. Point 4Point 5

Since additional points will increase execution time exponentially, this method can create some lengthy processing times, especially on a slower machine or with an excessive number of points.

The methods I use for calculating distances between points and for comparing distances between lists of points are far from the [code-heavier] most-efficient alternatives, but they work for my "one-off" needs.

Depending on my purposes, I'll switch (almost identical code) between Excel & Access, but Access is much faster, so you may want to move your list into a table and do it that way.

One of the lists of points I compare has 252 items, which requires 31,628 individual comparisons using this "easy-code" method. In Excel, the process completes in 1.12 seconds, which is Access it only takes 0.16 seconds.

This may not seem like a big difference until we starting working with longer lists of points: another list of mine (closer to the size of yours) has about 12,000 points, which requires 71,994,000 calculations using the Brute Force method. In Access, the process completes in 8.6 minutes, so I estimate it would take about an hour in Excel.

Of course, all of these times are based on my operating system, processing power, Office version, etc. VBA isn't ideal for this level of computation, and everything you can do to reduce length of code will make a big difference, including commenting-out the status bar updates, immediate-window output, turn off screen updates, etc.

This code is a little messy & un-commented since I slapped it together for my own purposes, but it works for me. Let me know if you have any questions about how it works. All calculations are in metric but can be easily converted.

Sub findShortestDist_Excel()

    Const colLatitude = "C" ' Col.C = Lat, Col.D = Lon
    Dim pointList As Range, pointCount As Long, c As Range, _
        arrCoords(), x As Long, y As Long
    Dim thisDist As Double, minDist As Double, minDist_txt As String
    Dim cntCurr As Long, cntTotal As Long, timerStart As Single

    timerStart = Timer
    Set pointList = Sheets("Stops").UsedRange.Columns(colLatitude)
    pointCount = WorksheetFunction.Count(pointList.Columns(1))

    'build array of numbers found in Column C/D
    ReDim arrCoords(1 To 3, 1 To pointCount)
    For Each c In pointList.Columns(1).Cells
        If IsNumeric(c.Value) And Not IsEmpty(c.Value) Then
            x = x + 1
            arrCoords(1, x) = c.Value
            arrCoords(2, x) = c.Offset(0, 1).Value
        End If
    Next c

    minDist = -1
    cntTotal = pointCount * (pointCount + 1) / 2

    'loop through array
    For x = 1 To pointCount
        For y = x + 1 To pointCount
            If (arrCoords(1, x) & arrCoords(2, x)) <> (arrCoords(1, y) & arrCoords(2, y)) Then
                cntCurr = cntCurr + 1
                thisDist = Distance(arrCoords(1, x), arrCoords(2, x), _
                    arrCoords(1, y), arrCoords(2, y))
                'check if this distance is the smallest yet
                If ((thisDist < minDist) Or (minDist = -1)) And thisDist > 0 Then
                    minDist = thisDist
                    'minDist_txt = arrCoords(1, x) & "," & arrCoords(2, x) & " -> " & arrCoords(1, y) & "," & arrCoords(2, y)
                End If
                'Application.StatusBar = "Calculating Distances: " & Format(cntCurr / cntTotal, "0.0%")
            End If
        Next y
        'DoEvents
    Next x

    Debug.Print "Minimum distance: " & minDist_txt & " = " & minDist & " meters"
    Debug.Print "(" & Round(Timer - timerStart, 2) & "sec)"
    Application.StatusBar = "Finished.  Minimum distance: " & minDist_txt & " = " & minDist & "m"

End Sub

Note that the procedure above is dependent on the following (which has slightly different versions for Access vs. Excel):

Excel: Calculate distance between points

Public Function Distance(ByVal lat1 As Double, ByVal lon1 As Double, _
    ByVal lat2 As Double, ByVal lon2 As Double) As Double
'returns Meters distance in Excel (straight-line)
    Dim theta As Double: theta = lon1 - lon2
    Dim Dist As Double: Dist = Math.Sin(deg2rad(lat1)) * Math.Sin(deg2rad(lat2)) + Math.Cos(deg2rad(lat1)) * Math.Cos(deg2rad(lat2)) * Math.Cos(deg2rad(theta))
    Dist = rad2deg(WorksheetFunction.Acos(Dist))
    Distance = Dist * 60 * 1.1515 * 1.609344 * 1000
End Function

Function deg2rad(ByVal deg As Double) As Double
    deg2rad = (deg * WorksheetFunction.PI / 180#)
End Function

Function rad2deg(ByVal rad As Double) As Double
    rad2deg = rad / WorksheetFunction.PI * 180#
End Function

...and alternative code, for Microsoft Access:

Access: Shortest Distance

Sub findShortestDist_Access()

    Const tableName = "Stops"
    Dim pointCount As Long, arrCoords(), x As Long, y As Long
    Dim thisDist As Double, minDist As Double
    Dim cntCurr As Long, cntTotal As Long, timerStart As Single
    Dim rs As Recordset

    timerStart = Timer

    Set rs = CurrentDb.OpenRecordset("SELECT * FROM " & tableName)
    With rs
        .MoveLast
        .MoveFirst
        pointCount = .RecordCount

        'build array of numbers found in Column C/D
        ReDim arrCoords(1 To 2, 1 To pointCount)
        Do While Not .EOF
            x = x + 1
            arrCoords(1, x) = !stop_lat
            arrCoords(2, x) = !stop_lon
            .MoveNext
        Loop
        .Close
    End With

    minDist = -1
    cntTotal = pointCount * (pointCount + 1) / 2
    SysCmd acSysCmdInitMeter, "Calculating Distances:", cntTotal

    'loop through array
    For x = 1 To pointCount
        For y = x + 1 To pointCount
                cntCurr = cntCurr + 1
                thisDist = Distance(arrCoords(1, x), arrCoords(2, x), _
                    arrCoords(1, y), arrCoords(2, y))
                'check if this distance is the smallest yet
                If ((thisDist < minDist) Or (minDist = -1)) And thisDist > 0 Then
                    minDist = thisDist
                End If
                SysCmd acSysCmdUpdateMeter, cntCurr
        Next y
        DoEvents
    Next x
    SysCmd acSysCmdRemoveMeter
    Debug.Print "Minimum distance: " & minDist_txt & " = " & minDist & " meters"
    Debug.Print "(" & Round(Timer - timerStart, 2) & "sec)"

End Sub

Note that the procedure above is dependent on the following... (Access may handle mass-calculations more quickly, but we have to build some functions ourselves that are built-in to Excel)

Access: Calculate distance between points

Const pi As Double = 3.14159265358979

Public Function Distance(ByVal lat1 As Double, ByVal lon1 As Double, _
    ByVal lat2 As Double, ByVal lon2 As Double) As Double
'returns Meters distance in Access (straight-line)
    Dim theta As Double: theta = lon1 - lon2
    Dim dist As Double
    dist = Math.Sin(deg2rad(lat1)) * Math.Sin(deg2rad(lat2)) + Math.Cos(deg2rad(lat1)) _
        * Math.Cos(deg2rad(lat2)) * Math.Cos(deg2rad(theta))
    dist = rad2deg(aCos(dist))
    Distance = dist * 60 * 1.1515 * 1.609344 * 1000
End Function

Function deg2rad(ByVal deg As Double) As Double
    deg2rad = (deg * pi / 180#)
End Function

Function rad2deg(ByVal rad As Double) As Double
    rad2deg = rad / pi * 180#
End Function

Function aTan2(x As Double, y As Double) As Double
    aTan2 = Atn(y / x)
End Function

Function aCos(x As Double) As Double
    On Error GoTo aErr
    If x = 0 Or Abs(x) = 1 Then
        aCos = 0
    Else
        aCos = Atn(-x / Sqr(-x * x + 1)) + 2 * Atn(1)
    End If
    Exit Function
aErr:
    aCos = 0
End Function

Planar Case

Another method of calculating closer points is called Planar Case. I haven't seen any ready-to-use code samples and I don't need it bad enough to bother coding it, but the gist of it is this:

Planar Case

Read about this and more about the Closest pair of points problem on Wikipedia.

like image 191
ashleedawg Avatar answered Sep 19 '22 13:09

ashleedawg


I would recommend using arrays as @Qharr said. I would also look to speed up the process by including some logic steps that avoid doing the complex math on every set of points.

What I mean is that you can do a Rough Estimate first to see whether or not to bother doing the actual calculations. I went with looking at whether or not either the Lat or Long of the current position is closer than the last closest point, but you could do anything you wanted.

I would change your code to something like:

Sub WellSpacing()
    Dim R As Integer, C As Integer, L As Integer, LastRow As Integer, Shortest() As Integer
    Dim Lats() As Double, Longs() As Double, Distances() As Double
    Dim Distance As Double, D1 As Double, D2 As Double, D3 As Double
    Dim PI As Double

    On Error Resume Next
    PI = Application.WorksheetFunction.PI()
    L = 2
    R = 3
    C = 10
    LastRow = Sheets("Test").Cells(Rows.Count, 10).End(xlUp).Row
    ReDim Lats(1 To (LastRow - 1)) As Double
    ReDim Longs(1 To (LastRow - 1)) As Double
    ReDim Distances(1 To (LastRow - 1)) As Double
    ReDim Shortest(1 To (LastRow - 1)) As Integer

    For L = 2 To LastRow
        Lats(L - 1) = Sheets("Test").Range("J" & L).Value
        Longs(L - 1) = Sheets("Test").Range("K" & L).Value
    Next L

    For L = 1 To (LastRow - 1)
        'This is a method of setting an initial value that can't be obtained  through the caclucations (so you will know if any calcs have been done or not).
        Distances(L) = -1
        For R = 1 To (LastRow - 1)
            'This minimises your calculations by 15,000 to begin with
            If R = L Then GoTo Skip_This_R
            'This skips checking the previous distances if it is the first calculation being checked.
            If Distances(L) = -1 Then GoTo Skip_Check
            'If there has already been a distance calculated, this does a rough check of whether the Lat or Long is closer. If neither
            'the Lat or Long are closer than the current closest, then it will skip it. This reduces the code by 7 lines for most pairs.
            If Abs(Lats(L) - Lats(R)) < Abs(Lats(L) - Lats(Shortest(L))) Or Abs(Longs(L) - Longs(R)) < Abs(Longs(L) - Longs(Shortest(L))) Then
Skip_Check:
                    D1 = Sin((Abs((Lats(R) - Lats(L))) * PI / 180 / 2)) ^ 2 + Cos(Lats(L) * PI / 180) * Cos(Lats(R) * PI / 180) * Sin(Abs(Longs(R) - Longs(L)) * PI / 180 / 2) ^ 2
                    D2 = 2 * Application.WorksheetFunction.Atan2(Sqr(1 - D1), Sqr(D1))
                    D3 = 6371 * D2 * 3280.84
                    If D3 < Distances(L) Or Distances(L) = -1 Then
                            Distances(L) = D3
                            'This stores the index value in the array of the closest Lat/Long point so far.
                            Shortest(L) = R
                    End If
            End If
Skip_This_R:
        Next R
        'This puts the resulting closest distance into the corresponding cell.
        Sheets("Test").Range("L" & (L + 1)).Value = Distances(L)
        'This clears any previous comments on the cell.
        Sheets("Test").Range("L" & (L + 1)).Comments.Delete
        'This adds a nice comment to let you know which Lat/Long position it is closest to.
        Sheets("Test").Range("L" & (L + 1)).AddComment "Matched to Row " & (Shortest(L) + 1)
    Next L
End Sub
like image 36
Sercho Avatar answered Sep 21 '22 13:09

Sercho