I have a matrix, e.g.:
set.seed(1)
m = matrix(rep(NA,100), nrow=10)
m[sample(1:100,10)] = 1
m
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] NA NA NA NA NA NA NA NA NA NA
[2,] NA NA NA NA NA NA 1 NA NA NA
[3,] NA NA NA NA NA NA NA NA NA NA
[4,] NA NA NA NA NA NA NA NA NA NA
[5,] NA NA NA NA NA NA NA NA NA NA
[6,] 1 NA NA NA NA NA NA NA 1 NA
[7,] NA NA 1 1 NA 1 NA NA NA 1
[8,] NA NA NA NA NA 1 NA NA NA NA
[9,] NA NA NA NA NA NA NA NA 1 NA
[10,] NA 1 NA NA NA NA NA NA NA NA
I want to convert all NA values that are next (adjacent) to a non-NA value to zero. Is there any swishy matrix way to achieve this, without some hideous row-wise and col-wise looping algorithm?
NB. I've re-worked the example to be less ambiguous. I need all NA values above, below, left, and right of a non-NA value to become zero!
Adjacency Matrix Definition. The adjacency matrix, also called as the connection matrix, is a matrix containing rows and columns which is used to represent a simple labelled graph, with 0 or 1 in the position of (V i , V j) according to the condition whether V i and V j are adjacent or not.
An (a, b, c) -adjacency matrix A of a simple graph has Ai,j = a if (i, j) is an edge, b if it is not, and c on the diagonal. The Seidel adjacency matrix is a (−1, 1, 0) -adjacency matrix.
The connection matrix is considered as a square array where each row represents the out-nodes of a graph and each column represents the in-nodes of a graph. Entry 1 represents that there is an edge between two nodes. The adjacency matrix for an undirected graph is symmetric.
Note: In an adjacency matrix, 0 represents that there is no association is exists between two nodes, whereas 1 represents that there is an association is exists between two nodes. How to create an adjacency matrix?
m[is.na(m) & !(cbind(is.na(m[,-1L]),T) & cbind(T,is.na(m[,-ncol(m)])) & rbind(is.na(m[-1L,]),T) & rbind(T,is.na(m[-nrow(m),])))] <- 0;
m;
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] NA NA NA NA NA NA 0 NA NA NA
## [2,] NA NA NA NA NA 0 1 0 NA NA
## [3,] NA NA NA NA NA NA 0 NA NA NA
## [4,] NA NA NA NA NA NA NA NA NA NA
## [5,] 0 NA NA NA NA NA NA NA 0 NA
## [6,] 1 0 0 0 NA 0 NA 0 1 0
## [7,] 0 0 1 1 0 1 0 NA 0 1
## [8,] NA NA 0 0 0 1 0 NA 0 0
## [9,] NA 0 NA NA NA 0 NA 0 1 0
## [10,] 0 1 0 NA NA NA NA NA 0 NA
The solution works as follows.
We construct a logical index matrix with TRUE
where an element is NA AND is adjacent to (above, below, left, or right of) at least one non-NA element. We can then subscript m
with the logical index matrix and assign the desired replacement value.
The LHS of the logical conjunction is easy; it is simply is.na(m)
.
The RHS of the logical conjunction is the trickiest part. We need to perform 4 tests, one for each direction of adjacency. The general algorithm is:
1: Index off the singular index of the adjacency direction that is not adjacent to any other index with respect to that adjacency direction. For example, for the "right direction", we index off the leftmost column, because it is not to the right of any other index. In other words, there is no column that has the leftmost column on its right, so we can ignore it (and must remove it) for the "right direction" computation.
2: Test the submatrix for NAs using is.na()
.
3: We then must bind (cbind()
for horizontal adjacency directions, rbind()
for vertical) TRUE
on the opposite side (that is, opposite of the index that was removed in step 1) of the resulting logical submatrix. This effectively causes the last index in the adjacency direction to always have a (pseudo-)NA in its adjacency direction, so it will never be replaced due to that adjacency direction.
4: Logical AND the 4 tests. The result will be a logical matrix with TRUE
for elements that have an NA in every adjacent cell.
5: Negate the result of step 4. This will produce a logical matrix with TRUE
for elements that have at least one non-NA in any adjacent cell.
Note that there's another way to do this, which is perhaps slightly more intuitive. We can write each of the 4 tests to test for non-NA, as opposed to NA, and then logical OR them together. This would also require binding FALSE
instead of TRUE
for the last index. It would look like this:
m[is.na(m) & (cbind(!is.na(m[,-1L]),F) | cbind(F,!is.na(m[,-ncol(m)])) | rbind(!is.na(m[-1L,]),F) | rbind(F,!is.na(m[-nrow(m),])))] <- 0;
m;
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] NA NA NA NA NA NA 0 NA NA NA
## [2,] NA NA NA NA NA 0 1 0 NA NA
## [3,] NA NA NA NA NA NA 0 NA NA NA
## [4,] NA NA NA NA NA NA NA NA NA NA
## [5,] 0 NA NA NA NA NA NA NA 0 NA
## [6,] 1 0 0 0 NA 0 NA 0 1 0
## [7,] 0 0 1 1 0 1 0 NA 0 1
## [8,] NA NA 0 0 0 1 0 NA 0 0
## [9,] NA 0 NA NA NA 0 NA 0 1 0
## [10,] 0 1 0 NA NA NA NA NA 0 NA
The first approach is preferable because it requires only a single negation, whereas the second approach requires 4 negations.
library(raster);
library(microbenchmark);
bgoldst1 <- function(m) { m[is.na(m) & !(cbind(is.na(m[,-1L]),T) & cbind(T,is.na(m[,-ncol(m)])) & rbind(is.na(m[-1L,]),T) & rbind(T,is.na(m[-nrow(m),])))] <- 0; m; };
bgoldst2 <- function(m) { m[is.na(m) & (cbind(!is.na(m[,-1L]),F) | cbind(F,!is.na(m[,-ncol(m)])) | rbind(!is.na(m[-1L,]),F) | rbind(F,!is.na(m[-nrow(m),])))] <- 0; m; };
geotheory <- function(m) { r <- raster(m,crs='+init=epsg:27700'); extent(r) <- extent(1,ncol(m),1,nrow(m)); b <- as.matrix(buffer(r,1)); m[is.na(m) & !is.na(b)] <- 0; m; };
set.seed(1L); m <- matrix(rep(NA,100),nrow=10L); m[sample(1:100,10L)] <- 1;
expected <- bgoldst1(m);
identical(expected,bgoldst2(m));
## [1] TRUE
identical(expected,geotheory(m));
## [1] TRUE
microbenchmark(bgoldst1(m),bgoldst2(m),geotheory(m));
## Unit: microseconds
## expr min lq mean median uq max neval
## bgoldst1(m) 89.380 96.0085 110.0142 107.9825 119.1015 197.149 100
## bgoldst2(m) 87.242 97.5055 111.4725 107.3410 121.2410 176.194 100
## geotheory(m) 5010.376 5519.7095 6017.3685 5824.4115 6289.9115 9013.201 100
set.seed(1L); NR <- 100L; NC <- 100L; probNA <- 0.9; m <- matrix(sample(c(1,NA),NR*NC,T,c(1-probNA,probNA)),NR);
expected <- bgoldst1(m);
identical(expected,bgoldst2(m));
## [1] TRUE
identical(expected,geotheory(m));
## [1] TRUE
microbenchmark(bgoldst1(m),bgoldst2(m),geotheory(m));
## Unit: milliseconds
## expr min lq mean median uq max neval
## bgoldst1(m) 6.815069 7.053484 7.265562 7.100954 7.220269 8.930236 100
## bgoldst2(m) 6.920270 7.071018 7.381712 7.127683 7.217275 16.034825 100
## geotheory(m) 56.505277 57.989872 66.803291 58.494288 59.451588 571.142534 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With