Here is an example of a problem I am attempting to solve and implements in a much larger database:
I have a sparse grid of points across the new world, with lat and long defined as below.
LAT<-rep(-5:5*10, 5)
LON<-rep(seq(-140, -60, by=20), each=11)
I know the color of some points on my grid
COLOR<-(c(NA,NA,NA,"black",NA,NA,NA,NA,NA,"red",NA,NA,"green",NA,"blue","blue",NA,"blue",NA,NA,"yellow",NA,NA,"yellow",NA+
NA,NA,NA,"blue",NA,NA,NA,NA,NA,NA,NA,"black",NA,"blue","blue",NA,"blue",NA,NA,"yellow",NA,NA,NA,NA,"red",NA,NA,"green",NA,"blue","blue"))
data<-as.data.frame(cbind(LAT,LON,COLOR))
What I want to do is replace the NA values in COLOR with the color that is closeset (in distance) to that point. In the actual implementation, I am not worried too much with ties, but I suppose it is possible (I could probably fix those by hand).
Thanks
Yup.
First, make your data frame with data.frame
or things all get coerced to characters:
data<-data.frame(LAT=LAT,LON=LON,COLOR=COLOR)
Split the data frame up - you could probably do this in one go but this makes things a bit more obvious:
query = data[is.na(data$COLOR),]
colours = data[!is.na(data$COLOR),]
library(FNN)
neighs = get.knnx(colours[,c("LAT","LON")],query[,c("LAT","LON")],k=1)
Now insert the replacement colours directly into the data
dataframe:
data[is.na(data$COLOR),"COLOR"]=colours$COLOR[neighs$nn.index]
plot(data$LON,data$LAT,col=data$COLOR,pch=19)
Note however that distance is being computed using pythagoras geometry on lat-long, which isn't true because the earth isn't flat. You might have to transform your coordinates to something else first.
I came up with this solution, but Spacedman's seems much better. Note that I also assume the Earth is flat here :)
# First coerce to numeric from factor:
data$LAT <- as.numeric(as.character(data$LAT))
data$LON <- as.numeric(as.character(data$LON))
n <- nrow(data)
# Compute Euclidean distances:
Dist <- outer(1:n,1:n,function(i,j)sqrt((data$LAT[i]-data$LAT[j])^2 + (data$LON[i]-data$LON[j])^2))
# Dummy second data:
data2 <- data
# Loop over data to fill:
for (i in 1:n)
{
if (is.na(data$COLOR[i]))
{
data$COLOR[i] <- data2$COLOR[order(Dist[i,])[!is.na(data2$COLOR[order(Dist[i,])])][1]]
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With