I have a set of lng/lat coordinates. What would be an efficient method of calculating the greatest distance between any two points in the set (the "maximum diameter" if you will)? A naive way is to use Haversine formula to calculate the distance between each 2 points and get the maximum, but this doesn't scale well obviously. Edit: the points are located on a sufficiently small area, measuring the area in which a person carrying a mobile device was active in the course of a single day.

I think that the following could be a useful approximation, which scales linearly instead of quadratically with the number of points, and is quite easy to implement: <ol> <li>calculate the center of mass M of the points</li> <li>find the point P0 that has the maximum distance to M</li> <li>find the point P1 that has the maximum distance to P0 </li> <li>approximate the maximum diameter with the distance between P0 and P1 </li> </ol> This can be generalized by repeating step 3 N times, and taking the distance between PN-1 and PN Step 1 can be carried out efficiently approximating M as the average of longitudes and latitudes, which is OK when distances are "small" and the poles are sufficiently far away. The other steps could be carried out using the exact distance formula, but they are much faster if the points' coordinates can be approximated as lying on a plane. Once the "distant pair" (hopefully the pair with the maximum distance) has been found, its distance can be re-calculated with the exact formula. An example of approximation could be the following: if φ(M) and λ(M) are latitude and longitude of the center of mass calculated as Σφ(P)/n and Σλ(P)/n, <ul> <li>x(P) = (λ(P) - λ(M) + C) cos(φ(P))</li> <li>y(P) = φ(P) - φ(M) [ this is only for clarity, it can also simply be y(P) = φ(P) ]</li> </ul> where C is usually 0, but can be ± 360° if the set of points crosses the λ=±180° line. To find the maximum distance you simply have to find <ul> <li>max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2)</li> </ul> (you don't need the square root because it is monotonic) The same coordinate transformation could be used to repeat step 1 (in the new coordinate system) in order to have a better starting point. I suspect that if some conditions are met, the above steps (without repeating step 3) always lead to the "true distant pair" (my terminology). If I only knew which conditions... EDIT: I hate building on others' solutions, but someone will have to. Still keeping the above 4 steps, with the optional (but probably beneficial, depending on the typical distribution of points) repetition of step 3, and following the solution of Spacedman, doing calculations in 3D overcomes the limitations of closeness and distance from poles: <ul> <li>x(P) = sin(φ(P))</li> <li>y(P) = cos(φ(P)) sin(λ(P))</li> <li>z(P) = cos(φ(P)) cos(λ(P))</li> </ul> (the only approximation is that this holds only for a perfect sphere) The center of mass is given by x(M) = Σx(P)/n, etc., and the maximum one has to look for is <ul> <li>max((x(PN) - x(PN-1))2 + (y(PN) - y(PN-1))2 + (z(PN) - z(PN-1))2)</li> </ul> So: you first transform spherical to cartesian coordinates, then start from the center of mass, to find, in at least two steps (steps 2 and 3), the farthest point from the preceding point. You could repeat step 3 as long as the distance increases, perhaps with a maximum number of repetitions, but this won't take you away from a local maximum. Starting from the center of mass is not of much help, either, if the points are spread all over the Earth. EDIT 2: I learned enough R to write down the core of the algorithm (nice language for data analysis!) For the plane approximation, ignoring the problem around the λ=±180° line: <pre class="prettyprint"><code># input: lng, lat (vectors) rad = pi / 180; x = (lng - mean(lng)) * cos(lat * rad) y = (lat - mean(lat)) i = which.max((x - mean(x))^2 + (y )^2) j = which.max((x - x[i] )^2 + (y - y[i])^2) # output: i, j (indices) </code></pre> On my PC it takes less than a second to find the indices <code>i</code> and <code>j</code> for 1000000 points. The following 3D version is a bit slower, but works for any distribution of points (and does not need to be amended when the λ=±180° line is crossed): <pre class="prettyprint"><code># input: lng, lat rad = pi / 180 x = sin(lat * rad) f = cos(lat * rad) y = sin(lng * rad) * f z = cos(lng * rad) * f i = which.max((x - mean(x))^2 + (y - mean(y))^2 + (z - mean(z))^2) j = which.max((x - x[i] )^2 + (y - y[i] )^2 + (z - z[i] )^2) k = which.max((x - x[j] )^2 + (y - y[j] )^2 + (z - z[j] )^2) # optional # output: j, k (or i, j) </code></pre> The calculation of <code>k</code> can be left out (i.e., the result could be given by <code>i</code> and <code>j</code>), depending on the data and on the requirements. On the other hand, my experiments have shown that calculating a further index is useless. It should be remembered that, in any case, the distance between the resulting points is an estimate which is a lower bound of the "diameter" of the set, although it very often will be the diameter itself (how often depends on the data.) EDIT 3: Unfortunately the relative error of the plane approximation can, in extreme cases, be as much as 1-1/√3 &cong; 42.3%, which may be unacceptable, even if very rare. The algorithm can be modified in order to have an upper bound of approximately 20%, which I have derived by compass and straight-edge (the analytic solution is cumbersome). The modified algorithm finds a pair of points whith a locally maximal distance, then repeats the same steps, but this time starting from the midpoint of the first pair, possibly finding a different pair: <pre class="prettyprint"><code># input: lng, lat rad = pi / 180 x = (lng - mean(lng)) * cos(lat * rad) y = (lat - mean(lat)) i.n_1 = 1 # n_1: n-1 x.n_1 = mean(x) y.n_1 = 0 # = mean(y) s.n_1 = 0 # s: square of distance repeat { s = (x - x.n_1)^2 + (y - y.n_1)^2 i.n = which.max(s) x.n = x[i.n] y.n = y[i.n] s.n = s[i.n] if (s.n <= s.n_1) break i.n_1 = i.n x.n_1 = x.n y.n_1 = y.n s.n_1 = s.n } i.m_1 = 1 x.m_1 = (x.n + x.n_1) / 2 y.m_1 = (y.n + y.n_1) / 2 s.m_1 = 0 m_ok = TRUE repeat { s = (x - x.m_1)^2 + (y - y.m_1)^2 i.m = which.max(s) if (i.m == i.n || i.m == i.n_1) { m_ok = FALSE; break } x.m = x[i.m] y.m = y[i.m] s.m = s[i.m] if (s.m <= s.m_1) break i.m_1 = i.m x.m_1 = x.m y.m_1 = y.m s.m_1 = s.m } if (m_ok && s.m > s.n) { i = i.m j = i.m_1 } else { i = i.n j = i.n_1 } # output: i, j </code></pre> The 3D algorithm can be modified in a similar way. It is possible (both in the 2D and in the 3D case) to start over once again from the midpoint of the second pair of points (if found). The upper bound in this case is "left as an exercise for the reader" :-). Comparison of the modified algorithm with the (too) simple algorithm has shown, for normal and for square uniform distributions, a near doubling of processing time, and a reduction of the average error from .6% to .03% (order of magnitude). A further restart from the midpoint results in an a just slightly better average error, but almost equal maximum error. EDIT 4: I have to study this article yet, but it looks like the 20% I found with compass and straight-edge is in fact 1-1/√(5-2√3) &cong; 19.3%

Greatest distance between set of longitude/latitude points

2 Answers

Theorem #1: The ordering of any two great circle distances along the surface of the earth is the same as the ordering as the straight line distance between the points where you tunnel through the earth.

Hence turn your lat-long into x,y,z based either on a spherical earth of arbitrary radius or an ellipsoid of given shape parameters. That's a couple of sines/cosines per point (not per pair of points).

Now you have a standard 3-d problem that doesn't rely on computing Haversine distances. The distance between points is just Euclidean (Pythagoras in 3d). Needs a square-root and some squares, and you can leave out the square root if you only care about comparisons.

There may be fancy spatial tree data structures to help with this. Or algorithms such as http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/2d_alg.htm (click 'Next' for 3d methods). Or C++ code here: http://valis.cs.uiuc.edu/~sariel/papers/00/diameter/diam_prog.html

Once you've found your maximum distance pair, you can use the Haversine formula to get the distance along the surface for that pair.

199

answered Sep 20 '22 05:09

Spacedman

I think that the following could be a useful approximation, which scales linearly instead of quadratically with the number of points, and is quite easy to implement:

calculate the center of mass M of the points
find the point P₀ that has the maximum distance to M
find the point P₁ that has the maximum distance to P₀
approximate the maximum diameter with the distance between P₀ and P₁

This can be generalized by repeating step 3 N times, and taking the distance between P_N-1 and P_N

Step 1 can be carried out efficiently approximating M as the average of longitudes and latitudes, which is OK when distances are "small" and the poles are sufficiently far away. The other steps could be carried out using the exact distance formula, but they are much faster if the points' coordinates can be approximated as lying on a plane. Once the "distant pair" (hopefully the pair with the maximum distance) has been found, its distance can be re-calculated with the exact formula.

An example of approximation could be the following: if φ(M) and λ(M) are latitude and longitude of the center of mass calculated as Σφ(P)/n and Σλ(P)/n,

x(P) = (λ(P) - λ(M) + C) cos(φ(P))
y(P) = φ(P) - φ(M) [ this is only for clarity, it can also simply be y(P) = φ(P) ]

where C is usually 0, but can be ± 360° if the set of points crosses the λ=±180° line. To find the maximum distance you simply have to find

max((x(P_N) - x(P_N-1))² + (y(P_N) - y(P_N-1))²)

(you don't need the square root because it is monotonic)

The same coordinate transformation could be used to repeat step 1 (in the new coordinate system) in order to have a better starting point. I suspect that if some conditions are met, the above steps (without repeating step 3) always lead to the "true distant pair" (my terminology). If I only knew which conditions...

EDIT:

I hate building on others' solutions, but someone will have to.

Still keeping the above 4 steps, with the optional (but probably beneficial, depending on the typical distribution of points) repetition of step 3, and following the solution of Spacedman, doing calculations in 3D overcomes the limitations of closeness and distance from poles:

x(P) = sin(φ(P))
y(P) = cos(φ(P)) sin(λ(P))
z(P) = cos(φ(P)) cos(λ(P))

(the only approximation is that this holds only for a perfect sphere)

The center of mass is given by x(M) = Σx(P)/n, etc., and the maximum one has to look for is

max((x(P_N) - x(P_N-1))² + (y(P_N) - y(P_N-1))² + (z(P_N) - z(P_N-1))²)

So: you first transform spherical to cartesian coordinates, then start from the center of mass, to find, in at least two steps (steps 2 and 3), the farthest point from the preceding point. You could repeat step 3 as long as the distance increases, perhaps with a maximum number of repetitions, but this won't take you away from a local maximum. Starting from the center of mass is not of much help, either, if the points are spread all over the Earth.

EDIT 2:

I learned enough R to write down the core of the algorithm (nice language for data analysis!)

For the plane approximation, ignoring the problem around the λ=±180° line:

# input: lng, lat (vectors)
rad = pi / 180;
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i = which.max((x - mean(x))^2 + (y       )^2)
j = which.max((x - x[i]   )^2 + (y - y[i])^2)
# output: i, j (indices)

On my PC it takes less than a second to find the indices i and j for 1000000 points.
The following 3D version is a bit slower, but works for any distribution of points (and does not need to be amended when the λ=±180° line is crossed):

# input: lng, lat
rad = pi / 180
x = sin(lat * rad)
f = cos(lat * rad)
y = sin(lng * rad) * f
z = cos(lng * rad) * f
i = which.max((x - mean(x))^2 + (y - mean(y))^2 + (z - mean(z))^2)
j = which.max((x - x[i]   )^2 + (y - y[i]   )^2 + (z - z[i]   )^2)
k = which.max((x - x[j]   )^2 + (y - y[j]   )^2 + (z - z[j]   )^2) # optional
# output: j, k (or i, j)

The calculation of k can be left out (i.e., the result could be given by i and j), depending on the data and on the requirements. On the other hand, my experiments have shown that calculating a further index is useless.

It should be remembered that, in any case, the distance between the resulting points is an estimate which is a lower bound of the "diameter" of the set, although it very often will be the diameter itself (how often depends on the data.)

EDIT 3:

Unfortunately the relative error of the plane approximation can, in extreme cases, be as much as 1-1/√3 ≅ 42.3%, which may be unacceptable, even if very rare. The algorithm can be modified in order to have an upper bound of approximately 20%, which I have derived by compass and straight-edge (the analytic solution is cumbersome). The modified algorithm finds a pair of points whith a locally maximal distance, then repeats the same steps, but this time starting from the midpoint of the first pair, possibly finding a different pair:

# input: lng, lat
rad = pi / 180
x = (lng - mean(lng)) * cos(lat * rad)
y = (lat - mean(lat))
i.n_1 = 1 # n_1: n-1
x.n_1 = mean(x)
y.n_1 = 0 # = mean(y)
s.n_1 = 0 # s: square of distance
repeat {
   s = (x - x.n_1)^2 + (y - y.n_1)^2
   i.n = which.max(s)
   x.n = x[i.n]
   y.n = y[i.n]
   s.n = s[i.n]
   if (s.n <= s.n_1) break
   i.n_1 = i.n
   x.n_1 = x.n
   y.n_1 = y.n
   s.n_1 = s.n
}
i.m_1 = 1
x.m_1 = (x.n + x.n_1) / 2
y.m_1 = (y.n + y.n_1) / 2
s.m_1 = 0
m_ok  = TRUE
repeat {
   s = (x - x.m_1)^2 + (y - y.m_1)^2
   i.m = which.max(s)
   if (i.m == i.n || i.m == i.n_1) { m_ok = FALSE; break }
   x.m = x[i.m]
   y.m = y[i.m]
   s.m = s[i.m]
   if (s.m <= s.m_1) break
   i.m_1 = i.m
   x.m_1 = x.m
   y.m_1 = y.m
   s.m_1 = s.m
}
if (m_ok && s.m > s.n) {
   i = i.m
   j = i.m_1
} else {
   i = i.n
   j = i.n_1
}
# output: i, j

The 3D algorithm can be modified in a similar way. It is possible (both in the 2D and in the 3D case) to start over once again from the midpoint of the second pair of points (if found). The upper bound in this case is "left as an exercise for the reader" :-).

Comparison of the modified algorithm with the (too) simple algorithm has shown, for normal and for square uniform distributions, a near doubling of processing time, and a reduction of the average error from .6% to .03% (order of magnitude). A further restart from the midpoint results in an a just slightly better average error, but almost equal maximum error.

EDIT 4:

I have to study this article yet, but it looks like the 20% I found with compass and straight-edge is in fact 1-1/√(5-2√3) ≅ 19.3%

answered Sep 20 '22 05:09

20 revs

Related questions
                            
                                Buffer (geo)spatial points in R with gbuffer
                            
                                Rstudio Shiny how can I display the version of the Shiny server in the Shiny page?
                            
                                Copy upper triangle to lower triangle for several matrices in a list
                            
                                Avoid Scientific notation in cut function in R
                            
                                Can transparency be used with PostScript/EPS?
                            
                                How to change scientific notation on legend labels in ggplot2
                            
                                How to build qpdf on Windows?
                            
                                Error in lm.fit(x,y,offset = offset, singular.ok,...) 0 non-NA cases with boxcox formula
                            
                                facet_wrap add geom_hline
                            
                                How install R package "udunits2" in Ubuntu
                            
                                'localhost' connection without firewall popup
                            
                                R as a general purpose programming language [closed]
                            
                                passing a string as a data frame column name
                            
                                Ordering 1:17 by perfect square pairs
                            
                                write to csv file using separator
                            
                                R suppressing rownames in grid table
                            
                                Using table caption on R markdown file using knitr to use in pandoc to convert to pdf
                            
                                Strange output from fread when called from knitr
                            
                                skip some rows in read.csv in R
                            
                                How to convert from a list of lists to a list in R retaining names?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Greatest distance between set of longitude/latitude points

Tags:

algorithm

r

latitude-longitude

cran

geospatial

Jeroen Ooms

People also ask

2 Answers

Spacedman

20 revs

Recent Activity

Donate For Us