From a list of 10,000 stations with decimal coordinates, I am trying to identify stations which are within 100 feet of each other based on the distance calculated between these stations and create a subset of these stations. In the final list I want to have the names of the stations which are within 100 feet of each other, their latitude and longitudes and the distance between them.
I found similar questions for other platforms like mathworks (using rangesearch) or in SQL or JAVA but none in R.
Is there a way to do this in R? The closest answer I found was in Listing number of obervations by location which lists the number of observations within a distance, but seems the answers were incomplete and cannot determine the stations which are within a particular distance of each other.
Basically I am trying to figure out which stations are co-located.
I'd really appreciate any help with this.
Two approaches.
The first creates a distance matrix using earth.dist(...)
in the fossil
package, and then takes advantage of data.tables to assemble the table of results.
The second uses distHaversine(...)
in the geosphere
package to calculate distances and assemble the final colocation table in one step. The latter approach may or may not be faster, but will certainly be more memory efficient, as it never stores the full distance matrix. Also, this approach is amenable to using other distance measures in geosphere
, e.g., distVincentySphere(...)
, distVincentyEllipsoid(...)
, or distMeeus(...)
.
Note that the actual distances are slightly different, probably because earth.dist(...)
and distHaversine(...)
use slightly different estimates for the radius of the earth. Also, note that both approaches here rely on station numbers for IDs. If the stations have names, the code will have to be modified slightly.
First Approach: Using earth.dist(...)
df = read.table(header=T,text="long lat
1 -74.20139 39.82806
2 -74.20194 39.82806
3 -74.20167 39.82806
4 -74.20197 39.82824
5 -74.20150 39.82814
6 -74.26472 39.66639
7 -74.17389 39.87111
8 -74.07224 39.97353
9 -74.07978 39.94554") # your sample data
library(fossil) # for earth.dist(...)
library(data.table)
sep.ft <- 200 # critical separation (feet)
sep.km <- sep.ft*0.0003048 # critical separation (km)
m <- as.matrix(earth.dist(df)) # distance matrix in km
coloc <- data.table(which(m<sep.km, arr.ind=T)) # pairs of stations with dist<200 ft
setnames(coloc,c("row","col"),c("ST.1","ST.2")) # rename columns to reflect station IDs
coloc <- coloc[ST.1<ST.2,] # want only lower triagular part
coloc[,dist:=m[ST.1,ST.2]/0.0003048,by="ST.1,ST.2"] # append distances in feet
remove(m) # don't need distance matrix anymore...
stations <- data.table(id=as.integer(rownames(df)),df)
setkey(stations,id)
setkey(coloc,ST.1)
coloc[stations,c("long.1","lat.1"):=list(long,lat),nomatch=0]
setkey(coloc,ST.2)
coloc[stations,c("long.2","lat.2"):=list(long,lat),nomatch=0]
Produces this:
coloc
# ST.1 ST.2 dist long.1 lat.1 long.2 lat.2
# 1: 1 2 154.13436 -74.20139 39.82806 -74.20194 39.82806
# 2: 1 3 78.46840 -74.20139 39.82806 -74.20167 39.82806
# 3: 2 3 75.66596 -74.20194 39.82806 -74.20167 39.82806
# 4: 1 4 175.31180 -74.20139 39.82806 -74.20197 39.82824
# 5: 2 4 66.22069 -74.20194 39.82806 -74.20197 39.82824
# 6: 3 4 106.69018 -74.20167 39.82806 -74.20197 39.82824
# 7: 1 5 42.45634 -74.20139 39.82806 -74.20150 39.82814
# 8: 2 5 126.71608 -74.20194 39.82806 -74.20150 39.82814
# 9: 3 5 55.87449 -74.20167 39.82806 -74.20150 39.82814
# 10: 4 5 136.67612 -74.20197 39.82824 -74.20150 39.82814
Second Approach: Using distHaversine(...)
library(data.table)
library(geosphere)
sep.ft <- 200 # critical separation (feet)
stations <- data.table(id=as.integer(rownames(df)),df)
d <- function(x){ # distance between station[i] and all subsequent stations
r.ft <- 6378137*3.28084 # radius of the earth, in feet
if (x[1]==nrow(stations)) return() # don't process last row
ref <- stations[(x[1]+1):nrow(stations),]
z <- distHaversine(ref[,2:3,with=F],x[2:3], r=r.ft)
z <- data.table(ST.1=x[1], ST.2=ref$id, dist=z, long.1=x[2], lat.1=x[3], long.2=ref$long, lat.2=ref$lat)
return(z[z$dist<sep.ft,])
}
coloc.2 = do.call(rbind,apply(stations,1,d))
Produces this:
coloc.2
# ST.1 ST.2 dist long.1 lat.1 long.2 lat.2
# 1: 1 2 154.26350 -74.20139 39.82806 -74.20194 39.82806
# 2: 1 3 78.53414 -74.20139 39.82806 -74.20167 39.82806
# 3: 1 4 175.45868 -74.20139 39.82806 -74.20197 39.82824
# 4: 1 5 42.49191 -74.20139 39.82806 -74.20150 39.82814
# 5: 2 3 75.72935 -74.20194 39.82806 -74.20167 39.82806
# 6: 2 4 66.27617 -74.20194 39.82806 -74.20197 39.82824
# 7: 2 5 126.82225 -74.20194 39.82806 -74.20150 39.82814
# 8: 3 4 106.77957 -74.20167 39.82806 -74.20197 39.82824
# 9: 3 5 55.92131 -74.20167 39.82806 -74.20150 39.82814
# 10: 4 5 136.79063 -74.20197 39.82824 -74.20150 39.82814
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With