Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return last data frame column which is not NA

Tags:

dataframe

r

na

I have a dataset consisting of multiple cases that are stamped either 1 OR NA. I'm trying to figure out a way to return the highest numbered stamp that is not NA for each case.

Here are some sample data:

PIN <- c("case1", "case2", "case3", "case4", "case5")
STAMP_1 <- c(1, 1, 1, 1, 1)
STAMP_2 <- c(NA, 1, 1, NA, 1)
STAMP_3 <- c(1, NA, 1, 1, NA)
STAMP_4 <- c(NA, NA, 1, 1, NA)
STAMP_5 <- c(1, NA, NA, 1, NA)
data <- data.frame(PIN, STAMP_1, STAMP_2, STAMP_3, STAMP_4, STAMP_5)

I'd like to figure out a way to return a data frame that will have columns: "case1", "case2", "case3", "case4", "case5" AND "STAMP_5", "STAMP_2", "STAMP_4", "STAMP_5", "STAMP_2" in this case.

like image 230
Jklein Avatar asked Dec 24 '22 14:12

Jklein


1 Answers

Here is a method with max.col, is.na and names. max.col finds the column with the maximum value for each row. Here, we feed it the value of is.na, which is TRUE or FALSE and use ties.method="last" to take the final non-NA value. This position is used to index names(dat).

data.frame(PIN=dat$PIN,
           stamp=names(dat)[-1][max.col(!is.na(dat[-1]), ties.method="last")])
    PIN   stamp
1 case1 STAMP_5
2 case2 STAMP_2
3 case3 STAMP_4
4 case4 STAMP_5
5 case5 STAMP_2

In the case that you have an entire row with NAs, max.col will return the final position of the row (a silent failure?). One way to return an NA rather than that position is to use a trick with NA and exponentiation. Here, we apply through the rows and find any NA rows with any rows that have at least one non-NA value return FALSE (or 0).

data.frame(PIN=dat$PIN,
           stamp=names(dat)[-1][
                max.col(!is.na(dat[-1]), ties.method="last") * NA^!rowSums(!is.na(dat[-1]))])

I switched from applyapply(dat[-1], 1, function(x) all(is.na(x))) to !rowSums(!is.na(dat[-1])) after Frank's suggestion. This should be quite a bit faster than apply.

like image 177
lmo Avatar answered Jan 12 '23 16:01

lmo