I have a dataset consisting of multiple cases that are stamped either 1 OR NA. I'm trying to figure out a way to return the highest numbered stamp that is not NA for each case.
Here are some sample data:
PIN <- c("case1", "case2", "case3", "case4", "case5")
STAMP_1 <- c(1, 1, 1, 1, 1)
STAMP_2 <- c(NA, 1, 1, NA, 1)
STAMP_3 <- c(1, NA, 1, 1, NA)
STAMP_4 <- c(NA, NA, 1, 1, NA)
STAMP_5 <- c(1, NA, NA, 1, NA)
data <- data.frame(PIN, STAMP_1, STAMP_2, STAMP_3, STAMP_4, STAMP_5)
I'd like to figure out a way to return a data frame that will have columns: "case1", "case2", "case3", "case4", "case5" AND "STAMP_5", "STAMP_2", "STAMP_4", "STAMP_5", "STAMP_2" in this case.
Here is a method with max.col
, is.na
and names
. max.col
finds the column with the maximum value for each row. Here, we feed it the value of is.na
, which is TRUE or FALSE and use ties.method="last" to take the final non-NA value. This position is used to index names(dat)
.
data.frame(PIN=dat$PIN,
stamp=names(dat)[-1][max.col(!is.na(dat[-1]), ties.method="last")])
PIN stamp
1 case1 STAMP_5
2 case2 STAMP_2
3 case3 STAMP_4
4 case4 STAMP_5
5 case5 STAMP_2
In the case that you have an entire row with NAs, max.col
will return the final position of the row (a silent failure?). One way to return an NA rather than that position is to use a trick with NA and exponentiation. Here, we apply
through the rows and find any NA rows with any
rows that have at least one non-NA value return FALSE (or 0).
data.frame(PIN=dat$PIN,
stamp=names(dat)[-1][
max.col(!is.na(dat[-1]), ties.method="last") * NA^!rowSums(!is.na(dat[-1]))])
I switched from applyapply(dat[-1], 1, function(x) all(is.na(x)))
to !rowSums(!is.na(dat[-1]))
after Frank's suggestion. This should be quite a bit faster than apply
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With