I have a matrix of 3 columns. For each row, a non-missing value shall be selected, - if no value is found in column 1, column 2 will be searched, then column 3, and the order will be given by the user.
I am moderately happy with my convoluted nested ifelse
approach - Alas, this depends on the same length of given columns. But the number of columns should be flexible (thus flexible number of nested ifelse statements) - Meaning, if the user does only select one or two columns, NA shall result even if the undesired column contains a value.
foo_mat <- structure(c(
NA, 30L, 15, 0, NA, 100L, 87L, NA, 0, NA, 2L, NA,
10, 0, NA
), .Dim = c(5L, 3L), .Dimnames = list(NULL, c(
"a", "b", "c"
)))
foo <- function(x, preced) {
ifelse(!is.na(x[, preced[1]]), x[, preced[1]],
ifelse(!is.na(x[, preced[2]]), x[, preced[2]],
x[, preced[3]]
)
)
}
foo_mat
#> a b c
#> [1,] NA 100 2
#> [2,] 30 87 NA
#> [3,] 15 NA 10
#> [4,] 0 0 0
#> [5,] NA NA NA
foo(foo_mat, c("a", "c", "b"))
#> [1] 2 30 15 0 NA
foo(foo_mat, preced = c("b", "a"))
#> Error in x[, preced[3]]: subscript out of bounds #(of course)
# desired output
#> [1] 100 87 15 0 NA
Base R:
apply(foo_mat[,c("a","c","b")], 1, function(z) c(na.omit(z), NA)[1])
# [1] 2 30 15 0 NA
The anon-function is a two-step process:
NA
s, so that we can grab the first non-NA
valuena.omit(.)
will return integer(0)
, which is not what you want, so the c(., NA)[1]
ensures that after na.omit(.)
, we always have at least one value in the c(.)
vector, and we want the first of them; if na.omit
returns nothing, then at least we have the one NA
.Doing this row-wise is done with apply(foo_mat, 1, ...)
. You control the preference order by re-arranging the columns going into the apply
data, as in my use of foo_mat[,c("a","c","b")]
.
As a function:
foo <- function(data, preced = names(data)) apply(data[,preced,drop=FALSE], 1, function(z) c(na.omit(z), NA)[1])
foo(foo_mat, c("a", "c", "b"))
# [1] 2 30 15 0 NA
(The drop=FALSE
is defensive. Base R defaults the behavior of foo_mat[,"a"]
is a vector instead of a 1-column matrix. This breaks many things, including apply
. So adding drop=FALSE
prevents the default reduction behavior.)
An alternative that is about as fast as the other answers:
foo <- function(data, preced) apply(data[,preced,drop=FALSE], 1, function(z) z[!is.na(z)][1])
Same functionality, fewer calls, simple logic.
(Attribution: this alternative is a combination of work from @tmfmnk, @Tjebo, and me. Thanks!)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With