find locations within certain lat/lon distance in r

Tags:

geosphere

I have a gridded dataset, with data available at the following locations:

lon <- seq(-179.75,179.75, by = 0.5)
lat <- seq(-89.75,89.75, by = 0.5)

I would like to find all of the data points that are within 500 km of the location:

mylat <- 47.9625
mylon <- -87.0431

I aim to use the geosphere package in R, but the method I've currently written does not seem very efficient:

require(geosphere)
dd2 <- array(dim = c(length(lon),length(lat)))
for(i in 1:length(lon)){
  for(ii in 1:length(lat)){
    clon <- lon[i]
    clat <- lat[ii]
    dd <- as.numeric(distm(c(mylon, mylat), c(clon, clat), fun = distHaversine))
    dd2[i,ii] <- dd <= 500000
  }
}

Here, I loop through each grid in the data and find if the distance is less than 500 km. I then store a variable with either TRUE or FALSE, which I can then use to average the data (other variable). From this method, I want a matrix with TRUE or FALSE for the locations within 500 km from the lat and lon shown. Is there a more efficient method for doing this?

337

asked Aug 30 '16 08:08

Emma Tebbs

3 Answers

Timings:

Comparing @nicola's and my version gives:

Unit: milliseconds

               min         lq      mean     median         uq       max neval
nicola1 184.217002 219.924647 297.60867 299.181854 322.635960 898.52393   100
floo01   61.341560  72.063197  97.20617  80.247810  93.292233 286.99343   100
nicola2   3.992343   4.485847   5.44909   4.870101   5.371644  27.25858   100

My original solution: (IMHO nicola's second version is much cleaner and faster.)

You can do the following (explanation below)

require(geosphere)
my_coord <- c(mylon, mylat)
dd2 <- matrix(FALSE, nrow=length(lon), ncol=length(lat))
outer_loop_state <- 0
for(i in 1:length(lon)){
    coods <- cbind(lon[i], lat)
    dd <- as.numeric(distHaversine(my_coord, coods))
    dd2[i, ] <- dd <= 500000
    if(any(dd2[i, ])){
      outer_loop_state <- 1
    } else {
      if(outer_loop_state == 1){
        break
      }
    }
  }

Explanation:

For the loop i apply the following logic: enter image description here

outer_loop_state is initialized with 0. If a row with at least one raster-point inside the circle is found outer_loop_state is set to 1. Once there are no more points within the circle for a given row i break.

The distm call in @nicola version basically does the same without this trick. So it calculates all rows.

Code for timings:

microbenchmark::microbenchmark(
  {allCoords<-cbind(lon,rep(lat,each=length(lon)))
  res<-matrix(distm(cbind(mylon,mylat),allCoords,fun=distHaversine)<=500000,nrow=length(lon))},
  {my_coord <- c(mylon, mylat)
  dd2 <- matrix(FALSE, nrow=length(lon), ncol=length(lat))
  outer_loop_state <- 0
  for(i in 1:length(lon)){
    coods <- cbind(lon[i], lat)
    dd <- as.numeric(distHaversine(my_coord, coods))
    dd2[i, ] <- dd <= 500000
    if(any(dd2[i, ])){
      outer_loop_state <- 1
    } else {
      if(outer_loop_state == 1){
        break
      }
    }
  }},
  {#intitialize the return
    res<-matrix(FALSE,nrow=length(lon),ncol=length(lat))
    #we find the possible value of longitude that can be closer than 500000
    #How? We calculate the distance between us and points with our same lat 
    longood<-which(distm(c(mylon,mylat),cbind(lon,mylat))<500000)
    #Same for latitude
    latgood<-which(distm(c(mylon,mylat),cbind(mylon,lat))<500000)
    #we build the matrix with only those values to exploit the vectorized
    #nature of distm
    allCoords<-cbind(lon[longood],rep(lat[latgood],each=length(longood)))
    res[longood,latgood]<-distm(c(mylon,mylat),allCoords)<=500000}
)

136

answered Oct 20 '22 14:10

Rentrop

The dist* functions of the geosphere package are vectorized, so you only need to prepare better your inputs. Try this:

#prepare a matrix with coordinates of every position
allCoords<-cbind(lon,rep(lat,each=length(lon)))
#call the dist function and put the result in a matrix
res<-matrix(distm(cbind(mylon,mylat),allCoords,fun=distHaversine)<=500000,nrow=length(lon))
#check the result
identical(res,dd2)
#[1] TRUE

As the @Floo0 answer showed, there is a lot of unnecessary calculations. We can follow another strategy: we first determine the lon and lat range that can be closer than the threshold and then we use only them to calculate the distance:

#initialize the return
res<-matrix(FALSE,nrow=length(lon),ncol=length(lat))
#we find the possible values of longitude that can be closer than 500000
#How? We calculate the distances between us and points with our same lon 
longood<-which(distm(c(mylon,mylat),cbind(lon,mylat))<=500000)
#Same for latitude
latgood<-which(distm(c(mylon,mylat),cbind(mylon,lat))<=500000)
#we build the matrix with only those values to exploit the vectorized
#nature of distm
allCoords<-cbind(lon[longood],rep(lat[latgood],each=length(longood)))
res[longood,latgood]<-distm(c(mylon,mylat),allCoords)<=500000

In this way, you calculate just lg+ln+lg*ln (lg and ln are the length of latgood and longood), i.e. 531 distances, opposed to the 259200 with my previous method.

answered Oct 20 '22 13:10

nicola

I add below a solution using the spatialrisk package. The key functions in this package are written in C++ (Rcpp), and are therefore very fast.

First, load the data:

mylat <- 47.9625
mylon <- -87.0431

lon <- seq(-179.75,179.75, by = 0.5)
lat <- seq(-89.75,89.75, by = 0.5)
df <- expand.grid(lon = lon, lat = lat)

The function spatialrisk::points_in_circle() calculates the observations within radius from a center point. Note that distances are calculated using the Haversine formula.

Timings for the spatialrisk approach compared to the @Hugh version:

spatialrisk::points_in_circle(df, mylon, mylat, radius = 5e5)

Unit: milliseconds
       expr       min        lq      mean    median        uq       max neval cld 
spatialrisk  3.071897  3.366256  5.224479  4.068124  4.809626  17.24378   100   a 
     hutils 17.507311 20.788525 29.470707 25.061943 31.066139 268.29375   100   b

The result can be easily converted to a matrix.

Have a look at the excellent answer by @philcolbourn on how to test if a point is inside a circle. See: https://stackoverflow.com/a/7227057/5440749

answered Oct 20 '22 14:10

mharinga

Related questions
                            
                                Tuning a LASSO model and predicting using tidymodels
                            
                                R: unequi join with merge function
                            
                                Binding dataframes of different length (no cbind, no merge)
                            
                                Complex numbers in R vs. Matlab
                            
                                Efficiently find the overlap between two time intervals in R
                            
                                Installing R gsl package on Ubuntu
                            
                                Child background Julia processes from R Shiny apps do not stay alive after app is closed
                            
                                How can one control the number of axis ticks within `facet_wrap()`?
                            
                                Convert a properly formatted string to data frame
                            
                                How to sort some elements in a list while leaving the rest in place?
                            
                                Component-wise addition of matrices as list elements
                            
                                Finding What You Need in R: focused searching within R and all (3,500+) CRAN Packages
                            
                                In R, how do I set the first values of a long vector to the values of a shorter one?
                            
                                How to cache in IPython Notebook?
                            
                                Confidence interval for binomial data in R?
                            
                                Interaction Plot in ggplot2
                            
                                Why doesn't "+" operate on characters in R?
                            
                                Colorize Clusters in Dendogram with ggplot2
                            
                                Is there an alternative to "revalue" function from plyr when using dplyr?
                            
                                Get the list of functions loaded in R's global environment [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With