I have a matrix of 3 columns. For each row, a non-missing value shall be selected, - if no value is found in column 1, column 2 will be searched, then column 3, and the order will be given by the user.
I am moderately happy with my convoluted nested ifelse approach - Alas, this depends on the same length of given columns. But the number of columns should be flexible (thus flexible number of nested ifelse statements) - Meaning, if the user does only select one or two columns, NA shall result even if the undesired column contains a value.
foo_mat <- structure(c(
  NA, 30L, 15, 0, NA, 100L, 87L, NA, 0, NA, 2L, NA,
  10, 0, NA
), .Dim = c(5L, 3L), .Dimnames = list(NULL, c(
  "a", "b", "c"
)))
foo <- function(x, preced) {
    ifelse(!is.na(x[, preced[1]]), x[, preced[1]],
      ifelse(!is.na(x[, preced[2]]), x[, preced[2]],
        x[, preced[3]]
      )
    )
}
foo_mat
#>       a   b  c
#> [1,] NA 100  2
#> [2,] 30  87 NA
#> [3,] 15  NA 10
#> [4,]  0   0  0
#> [5,] NA  NA NA
foo(foo_mat, c("a", "c", "b"))
#> [1]  2 30 15  0 NA
foo(foo_mat, preced = c("b", "a"))
#> Error in x[, preced[3]]: subscript out of bounds #(of course)
# desired output
#> [1]  100 87 15 0 NA
                Base R:
apply(foo_mat[,c("a","c","b")], 1, function(z) c(na.omit(z), NA)[1])
# [1]  2 30 15  0 NA
The anon-function is a two-step process:
NAs, so that we can grab the first non-NA valuena.omit(.) will return integer(0), which is not what you want, so the c(., NA)[1] ensures that after na.omit(.), we always have at least one value in the c(.) vector, and we want the first of them; if na.omit returns nothing, then at least we have the one NA.Doing this row-wise is done with apply(foo_mat, 1, ...). You control the preference order by re-arranging the columns going into the apply data, as in my use of foo_mat[,c("a","c","b")].
As a function:
foo <- function(data, preced = names(data)) apply(data[,preced,drop=FALSE], 1, function(z) c(na.omit(z), NA)[1])
foo(foo_mat, c("a", "c", "b"))
# [1]  2 30 15  0 NA
(The drop=FALSE is defensive. Base R defaults the behavior of foo_mat[,"a"] is a vector instead of a 1-column matrix. This breaks many things, including apply. So adding drop=FALSE prevents the default reduction behavior.)
An alternative that is about as fast as the other answers:
foo <- function(data, preced) apply(data[,preced,drop=FALSE], 1, function(z) z[!is.na(z)][1])
Same functionality, fewer calls, simple logic.
(Attribution: this alternative is a combination of work from @tmfmnk, @Tjebo, and me. Thanks!)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With