Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Bilinear interpolation to fill gaps in R

I have a grid that contains gaps (NAs) that I want to fill using interpolation. My grid shows autocorrelation in the x and y dimensions, so I would like to try bilinear interpolation. Most of the solutions I have found are focused on 'upsampling' (interpolation for the purpose of increasing number of samples/size of grid), but I do not want/need to change the grid size. I just want to fill NAs using interpolation. Other potential solutions do not seem to handle NAs for the input grid of values (the 'z matrix'), or are neighborhood-based solutions rather than bilinear interpoloation, or simply have no answer.

I found that with the raster package, I can input a grid (as a raster) that contains NAs, and use the 'resample' command to output a grid of the same size. However, the results look like nearest neighbor interpolation rather than bilinear interpolation.

Am I missing something such that there is a way to do bilinear interpolation with the raster package? Or is there a better way to do bilinear interpolation simply to fill NAs?

library(raster)

# raster containing gap
r <- raster(nrow=10, ncol=10)
r[] <- 1:ncell(r)
r[25] <- NA

# The s raster is the same size as the r raster
s <- raster(nrow=10, ncol=10)
s <- resample(r, s, method='bilinear')
plot(r)
plot(s)
s[25]
s[35]
# s[25] appears to have been filled with neighbor s[35]

UPDATE

The Akima package seems like a promising alternative to the raster approach above, but I'm having trouble if there are NAs in the input grid of values (the Z matrix). Here's an example parallel to the example above to demonstrate. (Again, I'm interpolating to a grid the same size as the original).

library(akima)

# Use bilinear interpolation (no NAs in input)
rmat<-matrix(seq(1,100,1), nrow = 10, ncol = 10, byrow = T)
x <- seq(1,10,1)
y <- seq(1,10,1)
smat <- bilinear.grid(x, y, rmat, nx = 10, ny = 10) # works
plot(raster(rmat), main = "original")
plot(raster(smat$z), main = "interpolated")

# Try using bilinear interpolation but with an NA
rmat<-matrix(seq(1,100,1), nrow = 10, ncol = 10, byrow = T)
rmat[3,5] <- NA
x <- seq(1,10,1)
y <- seq(1,10,1)
smat <- bilinear.grid(x, y, rmat, nx = 10, ny = 10) # Error about NAs

UPDATE2

There was a great question from @Robert Hijmans about why not use a moving window average with the focal() command in the raster package. The reason is that I want to try bilinear interpolation, and I don't think a moving window average always gives the same answer as bilinear interpolation. However, this was not clear in the example I posted (in that example moving window and bilinear interp do give the same answer), so I'll demonstrate in a new example below. Note that the bilinear interpolation solution should be 8 for the example below (here is a handy calculator for tests).

library(raster)
r <- raster(nrow=10, ncol=10)

# Different grid values than earlier examples
values(r) <- c(rep(1:5, 4), rep(4:8, 4), rep(1:5, 4), rep(4:8, 4), rep(1:5, 4))
r[25] <- NA
plot(r)

# See what the mean of the moving window produces
f <- focal(r, w=matrix(1,nrow=3, ncol=3), fun=mean, NAonly=TRUE, na.rm=TRUE) 
f[25] # Moving window gives 5 but bilinear interp gives 8

# Note that this seems to be how the moving window works with equal weights
window_test <- c(r[14:16], r[24:26], r[34:36])
 mean(window_test, na.rm = T)

Am I missing something here? Maybe there is something clever with the weights argument of focal() that can produce a bilinear interpolation solution?

like image 275
user2860703 Avatar asked Jun 16 '18 23:06

user2860703


1 Answers

Let's use equal distance cells to avoid differences because of cell size variation with lon/lat data

library(raster)
r <- raster(nrow=10, ncol=10, crs='+proj=utm +zone=1 +datum=WGS84', xmn=0, xmx=1, ymn=0, ymx=1)

For this example, you might use focal

values(r) <- 1:ncell(r)
r[25] <- NA
f <- focal(r, w=matrix(1,nrow=3, ncol=3), fun=mean, NAonly=TRUE, na.rm=TRUE) 

I see that you dismiss "neighborhood-based solutions rather than bilinear interpoloation". But the question is why. In this case, you may want a neighborhood-based solution.

Update. Then again, in case of cells that are not approximately square, bilinear would be preferable.

values(r) <- c(rep(1:5, 4), rep(4:8, 4), rep(1:5, 4), rep(4:8, 4), rep(1:5, 4))
r[25] <- NA

The problem with bilinear interpolation normally uses 4 contiguous cells, but in this case, where you want the value for the center of a cell, the appropriate cell would be the value of the cell itself, because the distance to that cell is zero, and thus that is where the interpolation ends up. For example, for cell 23

extract(r, xyFromCell(r, 23))
#6
extract(r, xyFromCell(r, 23), method='bilinear')
#[1] 6

In this case the focal cell is NA, so you get the average of the focal cell and 3 more cells. The question is which three? It is arbitrary, but to make it work, the NA cell must get a value. The raster algorithm assigns the value below the NA cell to that cell (also 8 here). This works well, I think, to deal with NA values at edges (e.g. land/ocean), but perhaps not in this case. ` extract(r, xyFromCell(r, 25)) #NA extract(r, xyFromCell(r, 25), method='bilinear') #[1] 8

That is also what resample gives

resample(r, r)[25]
# 8

Is this what the on-line calculator suggests too?

This is very sensitive to small changes

extract(r, xyFromCell(r, 25)+0.0001, method='bilinear')
#[1] 4.998997

What I would really want in this case is the mean of the rook-neighbors

mean(r[adjacent(r, 25, pairs=FALSE)])
[1] 6

Or, more generally, the local inverse distance weighted average. You can compute that by setting up a weights matrix with focal

# compute weights matrix
a <- sort(adjacent(r, 25, 8, pairs=F, include=TRUE))
axy <- xyFromCell(r, a)
d <- pointDistance(axy, xyFromCell(r, 25), lonlat=F)
w <- matrix(d, 3, 3)
w[2,2] <- 0
w <- w / sum(w)

# A simpler approach could be: 
# w <- matrix(c(0,.25,0,.25,0,.25,0,.25,0), 3, 3)


foc <- focal(r, w, na.rm=TRUE, NAonly=TRUE)
foc[25]

In this example this is fine; but it would not be correct if there were multiple NA values in the focal area (as the sum of weights would no longer be 1). We can correct for that by computing the sum of weights

x <- as.integer(r/r)
sum_weights <- focal(x, w, na.rm=TRUE, NAonly=TRUE)

fw <- foc/sum_weights
done <- cover(r, fw)
done[25]
like image 61
Robert Hijmans Avatar answered Nov 04 '22 00:11

Robert Hijmans