Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: replacing NA with value of closest point

Tags:

r

Here is an example of a problem I am attempting to solve and implements in a much larger database:

I have a sparse grid of points across the new world, with lat and long defined as below.

LAT<-rep(-5:5*10, 5)
LON<-rep(seq(-140, -60, by=20), each=11)

I know the color of some points on my grid

COLOR<-(c(NA,NA,NA,"black",NA,NA,NA,NA,NA,"red",NA,NA,"green",NA,"blue","blue",NA,"blue",NA,NA,"yellow",NA,NA,"yellow",NA+
  NA,NA,NA,"blue",NA,NA,NA,NA,NA,NA,NA,"black",NA,"blue","blue",NA,"blue",NA,NA,"yellow",NA,NA,NA,NA,"red",NA,NA,"green",NA,"blue","blue"))
data<-as.data.frame(cbind(LAT,LON,COLOR))

What I want to do is replace the NA values in COLOR with the color that is closeset (in distance) to that point. In the actual implementation, I am not worried too much with ties, but I suppose it is possible (I could probably fix those by hand).

Thanks

like image 266
user1612278 Avatar asked Aug 20 '12 16:08

user1612278


2 Answers

Yup.

First, make your data frame with data.frame or things all get coerced to characters:

data<-data.frame(LAT=LAT,LON=LON,COLOR=COLOR)

Split the data frame up - you could probably do this in one go but this makes things a bit more obvious:

query = data[is.na(data$COLOR),]
colours = data[!is.na(data$COLOR),]
library(FNN)
neighs = get.knnx(colours[,c("LAT","LON")],query[,c("LAT","LON")],k=1)

Now insert the replacement colours directly into the data dataframe:

data[is.na(data$COLOR),"COLOR"]=colours$COLOR[neighs$nn.index]
plot(data$LON,data$LAT,col=data$COLOR,pch=19)

Note however that distance is being computed using pythagoras geometry on lat-long, which isn't true because the earth isn't flat. You might have to transform your coordinates to something else first.

like image 181
Spacedman Avatar answered Oct 03 '22 10:10

Spacedman


I came up with this solution, but Spacedman's seems much better. Note that I also assume the Earth is flat here :)

# First coerce to numeric from factor:
data$LAT <- as.numeric(as.character(data$LAT))
data$LON <- as.numeric(as.character(data$LON))

n <- nrow(data)

# Compute Euclidean distances:
Dist <- outer(1:n,1:n,function(i,j)sqrt((data$LAT[i]-data$LAT[j])^2 + (data$LON[i]-data$LON[j])^2))

# Dummy second data:
data2 <- data

# Loop over data to fill:
for (i in 1:n)
{
  if (is.na(data$COLOR[i]))
  {
    data$COLOR[i] <- data2$COLOR[order(Dist[i,])[!is.na(data2$COLOR[order(Dist[i,])])][1]]
  }
}
like image 43
Sacha Epskamp Avatar answered Oct 03 '22 11:10

Sacha Epskamp