Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: How can I sum across variables, within cases, while counting NA as zero

Tags:

r

na

Fake data for illustration:

df <- data.frame(a=c(1,2,3,4,5), b=(c(2,2,2,2,NA)), 
                 c=c(NA,2,3,4,5)))

This would get me the answer I want IF it weren't for the NA values:

df$count <- with(df, (a==1) + (b==2) + (c==3)) 

Also, would there be an even more elegant way if I was only interested in, e.g. variables==2?

df$count <- with(df, (a==2) + (b==2) + (c==2)) 

Many thanks!

like image 605
Grateful Guy Avatar asked Mar 23 '12 20:03

Grateful Guy


2 Answers

The following works for your specific example, but I have a suspicion that your real use case is more complicated:

df$count <- apply(df,1,function(x){sum(x == 1:3,na.rm = TRUE)})
> df
  a  b  c count
1 1  2 NA     2
2 2  2  2     1
3 3  2  3     2
4 4  2  4     1
5 5 NA  5     0

but this general approach should work. For instance, your second example would be something like this:

df$count <- apply(df,1,function(x){sum(x == 2,na.rm = TRUE)})

or more generally you could allow yourself to pass in a variable for the comparison:

df$count <- apply(df,1,function(x,compare){sum(x == compare,na.rm = TRUE)},compare = 1:3)
like image 143
joran Avatar answered Oct 18 '22 15:10

joran


Another way is to subtract your target vector from each row of your data.frame, negate and then do rowSums with na.rm=TRUE:

target <- 1:3
rowSums(!(df-rep(target,each=nrow(df))),na.rm=TRUE)
[1] 2 1 2 1 0

target <- rep(2,3)
rowSums(!(df-rep(target,each=nrow(df))),na.rm=TRUE)
[1] 1 3 1 1 0
like image 30
James Avatar answered Oct 18 '22 15:10

James