Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Applying condition across multiple columns ignoring NA

Suppose I have the following dataframe:

x <- c(1, 1, 2, 3, 4, 5)
y <- c(1, 1, 1, 3, 4, 5)
z <- c(NA, 1, 1, 3, 4, NA)

to get:

x  y  z
1  1  NA
1  1  1
2  1  1
3  3  3
4  4  4
5  4  NA

and I wanted to get a conditional statement such that if all of the non-NA x, y, and z values are equal to 1, then it would be flagged as 1, how would I go about writing this script?

For instance, what I want is the following:

x  y  z  flag1
1  1  NA 1
1  1  1  1
2  1  1  0
3  3  3  0
4  4  4  0
5  4  NA 0

Additionally, I would also want to flag if any of the variables contained a 4, ignoring NA, so that I can get:

x  y  z  flag1 flag2
1  1  NA 1     0
1  1  1  1     0
2  1  1  0     0
3  3  3  0     0
4  4  4  0     1
5  4  NA 0     1
like image 722
ssjjaca Avatar asked Jan 21 '26 22:01

ssjjaca


2 Answers

Easiest is with rowSums

df$flag <-  +(!rowSums(df != 1, na.rm = TRUE) & !!rowSums(!is.na(df)))
df$flag2 <- +(rowSums(df == 4, na.rm = TRUE) > 0 & !!rowSums(!is.na(df)))

-output

> df
  x y  z flag flag2
1 1 1 NA    1     0
2 1 1  1    1     0
3 2 1  1    0     0
4 3 3  3    0     0
5 4 4  4    0     1
6 5 4 NA    0     1

In tidyverse, we may use if_all with if_any for creating those columns

library(dplyr)
df %>%
    mutate(flag1 = +(if_all(everything(),  ~is.na(.)| . %in% 1)), 
            flag2 = +(if_any(x:z, ~ . %in% 4)))
  x y  z flag1 flag2
1 1 1 NA     1     0
2 1 1  1     1     0
3 2 1  1     0     0
4 3 3  3     0     0
5 4 4  4     0     1
6 5 4 NA     0     1

data

df <-structure(list(x = c(1, 1, 2, 3, 4, 5), y = c(1, 1, 1, 3, 4, 
4), z = c(NA, 1, 1, 3, 4, NA)), class = "data.frame", row.names = c(NA, 
-6L))
like image 182
akrun Avatar answered Jan 23 '26 10:01

akrun


Here's a version that more verbose than @Akrun's answer (and slower on larger datasets), but more customizable:

flag1 <- ifelse( (x == 1 | is.na(x) ) &
                 (y == 1 | is.na(y) ) &
                 (z == 1 | is.na(z) ), 1, 0)

flag2 <- ifelse( x == 4 | y == 4 | z == 4, 1, 0)

If you had a bunch of these vectors, you could store them in a matrix or data.frame so you don't need to list each column in order to do the calculation:

mat <- cbind(x,y,z)

flag1 <- apply(mat, 1, function(r) sum(r==1 | is.na(r)) == length(r))
flag2 <- apply(mat, 1, function(r) any(r==4, na.rm=T))
like image 25
DanY Avatar answered Jan 23 '26 10:01

DanY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!