Conditional matrix adjacency calculation

Tags:

matrix

I have a matrix, e.g.:

set.seed(1)
m = matrix(rep(NA,100), nrow=10)
m[sample(1:100,10)] = 1
m
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [2,]   NA   NA   NA   NA   NA   NA    1   NA   NA    NA
 [3,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [5,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
 [6,]    1   NA   NA   NA   NA   NA   NA   NA    1    NA
 [7,]   NA   NA    1    1   NA    1   NA   NA   NA     1
 [8,]   NA   NA   NA   NA   NA    1   NA   NA   NA    NA
 [9,]   NA   NA   NA   NA   NA   NA   NA   NA    1    NA
[10,]   NA    1   NA   NA   NA   NA   NA   NA   NA    NA

I want to convert all NA values that are next (adjacent) to a non-NA value to zero. Is there any swishy matrix way to achieve this, without some hideous row-wise and col-wise looping algorithm?

NB. I've re-worked the example to be less ambiguous. I need all NA values above, below, left, and right of a non-NA value to become zero!

894

asked May 13 '16 22:05

1 Answers

m[is.na(m) & !(cbind(is.na(m[,-1L]),T) & cbind(T,is.na(m[,-ncol(m)])) & rbind(is.na(m[-1L,]),T) & rbind(T,is.na(m[-nrow(m),])))] <- 0;
m;
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]   NA   NA   NA   NA   NA   NA    0   NA   NA    NA
##  [2,]   NA   NA   NA   NA   NA    0    1    0   NA    NA
##  [3,]   NA   NA   NA   NA   NA   NA    0   NA   NA    NA
##  [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [5,]    0   NA   NA   NA   NA   NA   NA   NA    0    NA
##  [6,]    1    0    0    0   NA    0   NA    0    1     0
##  [7,]    0    0    1    1    0    1    0   NA    0     1
##  [8,]   NA   NA    0    0    0    1    0   NA    0     0
##  [9,]   NA    0   NA   NA   NA    0   NA    0    1     0
## [10,]    0    1    0   NA   NA   NA   NA   NA    0    NA

The solution works as follows.

We construct a logical index matrix with TRUE where an element is NA AND is adjacent to (above, below, left, or right of) at least one non-NA element. We can then subscript m with the logical index matrix and assign the desired replacement value.

The LHS of the logical conjunction is easy; it is simply is.na(m).

The RHS of the logical conjunction is the trickiest part. We need to perform 4 tests, one for each direction of adjacency. The general algorithm is:

1: Index off the singular index of the adjacency direction that is not adjacent to any other index with respect to that adjacency direction. For example, for the "right direction", we index off the leftmost column, because it is not to the right of any other index. In other words, there is no column that has the leftmost column on its right, so we can ignore it (and must remove it) for the "right direction" computation.

2: Test the submatrix for NAs using is.na().

3: We then must bind (cbind() for horizontal adjacency directions, rbind() for vertical) TRUE on the opposite side (that is, opposite of the index that was removed in step 1) of the resulting logical submatrix. This effectively causes the last index in the adjacency direction to always have a (pseudo-)NA in its adjacency direction, so it will never be replaced due to that adjacency direction.

4: Logical AND the 4 tests. The result will be a logical matrix with TRUE for elements that have an NA in every adjacent cell.

5: Negate the result of step 4. This will produce a logical matrix with TRUE for elements that have at least one non-NA in any adjacent cell.

Note that there's another way to do this, which is perhaps slightly more intuitive. We can write each of the 4 tests to test for non-NA, as opposed to NA, and then logical OR them together. This would also require binding FALSE instead of TRUE for the last index. It would look like this:

m[is.na(m) & (cbind(!is.na(m[,-1L]),F) | cbind(F,!is.na(m[,-ncol(m)])) | rbind(!is.na(m[-1L,]),F) | rbind(F,!is.na(m[-nrow(m),])))] <- 0;
m;
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]   NA   NA   NA   NA   NA   NA    0   NA   NA    NA
##  [2,]   NA   NA   NA   NA   NA    0    1    0   NA    NA
##  [3,]   NA   NA   NA   NA   NA   NA    0   NA   NA    NA
##  [4,]   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
##  [5,]    0   NA   NA   NA   NA   NA   NA   NA    0    NA
##  [6,]    1    0    0    0   NA    0   NA    0    1     0
##  [7,]    0    0    1    1    0    1    0   NA    0     1
##  [8,]   NA   NA    0    0    0    1    0   NA    0     0
##  [9,]   NA    0   NA   NA   NA    0   NA    0    1     0
## [10,]    0    1    0   NA   NA   NA   NA   NA    0    NA

The first approach is preferable because it requires only a single negation, whereas the second approach requires 4 negations.

Benchmarking

library(raster);
library(microbenchmark);

bgoldst1 <- function(m) { m[is.na(m) & !(cbind(is.na(m[,-1L]),T) & cbind(T,is.na(m[,-ncol(m)])) & rbind(is.na(m[-1L,]),T) & rbind(T,is.na(m[-nrow(m),])))] <- 0; m; };
bgoldst2 <- function(m) { m[is.na(m) & (cbind(!is.na(m[,-1L]),F) | cbind(F,!is.na(m[,-ncol(m)])) | rbind(!is.na(m[-1L,]),F) | rbind(F,!is.na(m[-nrow(m),])))] <- 0; m; };
geotheory <- function(m) { r <- raster(m,crs='+init=epsg:27700'); extent(r) <- extent(1,ncol(m),1,nrow(m)); b <- as.matrix(buffer(r,1)); m[is.na(m) & !is.na(b)] <- 0; m; };

set.seed(1L); m <- matrix(rep(NA,100),nrow=10L); m[sample(1:100,10L)] <- 1;

expected <- bgoldst1(m);
identical(expected,bgoldst2(m));
## [1] TRUE
identical(expected,geotheory(m));
## [1] TRUE

microbenchmark(bgoldst1(m),bgoldst2(m),geotheory(m));
## Unit: microseconds
##          expr      min        lq      mean    median        uq      max neval
##   bgoldst1(m)   89.380   96.0085  110.0142  107.9825  119.1015  197.149   100
##   bgoldst2(m)   87.242   97.5055  111.4725  107.3410  121.2410  176.194   100
##  geotheory(m) 5010.376 5519.7095 6017.3685 5824.4115 6289.9115 9013.201   100

set.seed(1L); NR <- 100L; NC <- 100L; probNA <- 0.9; m <- matrix(sample(c(1,NA),NR*NC,T,c(1-probNA,probNA)),NR);

expected <- bgoldst1(m);
identical(expected,bgoldst2(m));
## [1] TRUE
identical(expected,geotheory(m));
## [1] TRUE

microbenchmark(bgoldst1(m),bgoldst2(m),geotheory(m));
## Unit: milliseconds
##          expr       min        lq      mean    median        uq        max neval
##   bgoldst1(m)  6.815069  7.053484  7.265562  7.100954  7.220269   8.930236   100
##   bgoldst2(m)  6.920270  7.071018  7.381712  7.127683  7.217275  16.034825   100
##  geotheory(m) 56.505277 57.989872 66.803291 58.494288 59.451588 571.142534   100

answered Sep 17 '22 20:09

bgoldst

Related questions
                            
                                When does the object returned by invisible() cease to be invisible?
                            
                                Equivalent to \Sexpr{} for Python, etc., in knitr + RMarkdown?
                            
                                Is it possible to sample from a conditional density in R given some conditional data?
                            
                                Create ggplot2 plot in memory?
                            
                                Pass column name to function from mutate_each
                            
                                R: How to change plot background color for a specific range in ggvis shiny app
                            
                                force "apply" to return a matrix?
                            
                                What causes this ggplot2 facet bug?
                            
                                Calculating partial correlation adjusted for a categorical variable
                            
                                How to Make RStudio Presentation Self-contained?
                            
                                Regression table in latex from splm
                            
                                how to override the 2GB memory limit when R starts
                            
                                User-specified attributes of data.table get removed
                            
                                Fastest way to apply function to all pairwise combinations of columns
                            
                                dplyr summarise_each() using multiple functions for different column subsets across the same groups
                            
                                Error in 1:object$nsdf : argument of length 0 when using plot.gam [duplicate]
                            
                                Axis labels not showing up
                            
                                how to include Leaflet (for R)-output into RMarkdown presentation?
                            
                                R, knitr doesn't respect order of chunks and text
                            
                                Creating table graph in R with plotly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With