I want to generate a new variable which the number of times some columns satisfy a criterion (like ==, <, >). The function needs to handle NA.
Sample data with some missing values:
x <- seq(10, 20)
y <- seq(12, 22)
y[4] <- NA
z <- seq(14, 24)
z[c(4,5)] <- NA
data <- cbind(x, y, z)
# x y z
# [1,] 10 12 14
# [2,] 11 13 15
# [3,] 12 14 16
# [4,] 13 NA NA
# [5,] 14 16 NA
# [6,] 15 17 19
# [7,] 16 18 20
# [8,] 17 19 21
# [9,] 18 20 22
# [10,] 19 21 23
# [11,] 20 22 24
In this example, I want is a variable, "less16", that sums up the number of values in each row that are < 16, across columns "x", "y" and "z". Desired result for the first few rows:
x y z less16
10 12 14 3
11 13 15 3
12 14 16 2
13 NA NA 1
14 16 NA 1
etc
I've tried rowSum, sum, which, for loops using if and else, all to no avail so far. Any advice would be greatly appreciated. Thanks in advance.
rowSums has the argument na.rm:
data$less16 <- rowSums(data < 16, na.rm = TRUE)
A lot of these functions actually have a na.rm parameter for excluding NA values:
apply(data,1,function(x) {sum(x < 16,na.rm = TRUE)})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With