Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Row-wise count of values that fulfill a condition

Tags:

r

r-faq

sum

rowsum

I want to generate a new variable which the number of times some columns satisfy a criterion (like ==, <, >). The function needs to handle NA.

Sample data with some missing values:

x <- seq(10, 20)
y <- seq(12, 22)
y[4] <- NA
z <- seq(14, 24)
z[c(4,5)] <- NA
data <- cbind(x, y, z)
#        x  y  z
# [1,]  10 12 14
# [2,]  11 13 15
# [3,]  12 14 16
# [4,]  13 NA NA
# [5,]  14 16 NA
# [6,]  15 17 19
# [7,]  16 18 20
# [8,]  17 19 21
# [9,]  18 20 22
# [10,] 19 21 23
# [11,] 20 22 24

In this example, I want is a variable, "less16", that sums up the number of values in each row that are < 16, across columns "x", "y" and "z". Desired result for the first few rows:

 x   y   z  less16
10  12  14       3
11  13  15       3
12  14  16       2
13  NA  NA       1
14  16  NA       1
etc

I've tried rowSum, sum, which, for loops using if and else, all to no avail so far. Any advice would be greatly appreciated. Thanks in advance.

like image 829
SMM Avatar asked Dec 10 '22 01:12

SMM


2 Answers

rowSums has the argument na.rm:

data$less16 <- rowSums(data < 16, na.rm = TRUE)
like image 81
Jesse Anderson Avatar answered Dec 27 '22 02:12

Jesse Anderson


A lot of these functions actually have a na.rm parameter for excluding NA values:

apply(data,1,function(x) {sum(x < 16,na.rm = TRUE)})
like image 40
joran Avatar answered Dec 27 '22 02:12

joran