Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to determine COUNTRY from millions of GPS coordinates [R]

I have millions of GPS coordinates and want to quickly add a column of the country of the coordinates.

My current method works but is extremely slow:

library(data.table)

#REPRODUCE DATA
data <- data.table(latitude=sample(seq(47,52,by=0.001), 1000000, replace = TRUE),
                   longitude=sample(seq(8,23,by=0.001), 1000000, replace = TRUE))

#REQUIRED PACKAGES
if (!require("sp")) install.packages("sp")
if (!require("rworldmap")) install.packages("rworldmap")
if (!require("sf")) install.packages("sf")
library(sp)
library(rworldmap)
library(sf)

#CURRENT SLOW FUNCTION
coords2country = function(points,latcol,loncol){  
  countriesSP <- getMap(resolution='low')
  pointsSP <- st_as_sf(points,coords=c(loncol,latcol),crs=4326)
  pointsSP<- as(pointsSP,"Spatial")
  # use 'over' to get indices of the Polygons object containing each point 
  indices = over(pointsSP, countriesSP)
  # return the ADMIN names of each country
  indices$ADMIN  
  #indices$ISO3 # returns the ISO3 code 
  #indices$continent   # returns the continent (6 continent model)
  #indices$REGION   # returns the continent (7 continent model)
}

#SLOW!
> system.time(data[,country:=coords2country(data,"latitude","longitude"),])
   user  system elapsed 
121.293   7.849 130.226 

Is there a faster/better way to do this? Thanks!

like image 908
Neal Barsch Avatar asked Mar 06 '23 05:03

Neal Barsch


1 Answers

There are two similar questions. They are in my comments above. The questions are asking how to get country names from coordinates. Here the OP is asking which is a faster way to do the task.

Based on the posts, we have three options.

  1. to use the custom function in this question;
  2. to use the geonames package; or
  3. to use map.where() in the map package.

The second option needs a bit of setup. So I just tested map.where(). The following is the result. As the OP said, this function is working much faster.

library(maps)
set.seed(111)
data <- data.table(latitude=sample(seq(47,52,by=0.001), 1000000, replace = TRUE),
                   longitude=sample(seq(8,23,by=0.001), 1000000, replace = TRUE))

system.time(data[, country := map.where(x = longitude, y = latitude)])

#   user  system elapsed 
#   7.20    0.05    7.29 
like image 131
jazzurro Avatar answered Mar 07 '23 19:03

jazzurro