Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nested ifelse statement with programmatic number of nested levels

Tags:

r

I have a matrix of 3 columns. For each row, a non-missing value shall be selected, - if no value is found in column 1, column 2 will be searched, then column 3, and the order will be given by the user.

I am moderately happy with my convoluted nested ifelse approach - Alas, this depends on the same length of given columns. But the number of columns should be flexible (thus flexible number of nested ifelse statements) - Meaning, if the user does only select one or two columns, NA shall result even if the undesired column contains a value.

foo_mat <- structure(c(
  NA, 30L, 15, 0, NA, 100L, 87L, NA, 0, NA, 2L, NA,
  10, 0, NA
), .Dim = c(5L, 3L), .Dimnames = list(NULL, c(
  "a", "b", "c"
)))

foo <- function(x, preced) {
    ifelse(!is.na(x[, preced[1]]), x[, preced[1]],
      ifelse(!is.na(x[, preced[2]]), x[, preced[2]],
        x[, preced[3]]
      )
    )
}

foo_mat
#>       a   b  c
#> [1,] NA 100  2
#> [2,] 30  87 NA
#> [3,] 15  NA 10
#> [4,]  0   0  0
#> [5,] NA  NA NA

foo(foo_mat, c("a", "c", "b"))
#> [1]  2 30 15  0 NA

foo(foo_mat, preced = c("b", "a"))
#> Error in x[, preced[3]]: subscript out of bounds #(of course)

# desired output
#> [1]  100 87 15 0 NA
like image 618
tjebo Avatar asked Dec 17 '22 12:12

tjebo


1 Answers

Base R:

apply(foo_mat[,c("a","c","b")], 1, function(z) c(na.omit(z), NA)[1])
# [1]  2 30 15  0 NA

The anon-function is a two-step process:

  • first, remove any NAs, so that we can grab the first non-NA value
  • second, it is feasible that na.omit(.) will return integer(0), which is not what you want, so the c(., NA)[1] ensures that after na.omit(.), we always have at least one value in the c(.) vector, and we want the first of them; if na.omit returns nothing, then at least we have the one NA.

Doing this row-wise is done with apply(foo_mat, 1, ...). You control the preference order by re-arranging the columns going into the apply data, as in my use of foo_mat[,c("a","c","b")].

As a function:

foo <- function(data, preced = names(data)) apply(data[,preced,drop=FALSE], 1, function(z) c(na.omit(z), NA)[1])
foo(foo_mat, c("a", "c", "b"))
# [1]  2 30 15  0 NA

(The drop=FALSE is defensive. Base R defaults the behavior of foo_mat[,"a"] is a vector instead of a 1-column matrix. This breaks many things, including apply. So adding drop=FALSE prevents the default reduction behavior.)

An alternative that is about as fast as the other answers:

foo <- function(data, preced) apply(data[,preced,drop=FALSE], 1, function(z) z[!is.na(z)][1])

Same functionality, fewer calls, simple logic.

(Attribution: this alternative is a combination of work from @tmfmnk, @Tjebo, and me. Thanks!)

like image 50
r2evans Avatar answered Jan 25 '23 23:01

r2evans