Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using `:=` in data.table to sum the values of two columns in R, ignoring NAs

Tags:

I have what I think is a very simple question related to the use of data.table and the := function. I don't think I quite understand the behaviour of := and often I run into similar problems.

Here is some example data

 mat <- structure(list(               col1 = c(NA, 0, -0.015038, 0.003817, -0.011407),                col2 = c(0.003745, 0.007463, -0.007407, -0.003731, -0.007491)),                .Names = c("col1", "col2"),                row.names = c(NA, 10L),                class = c("data.table", "data.frame")) 

which gives

> mat          col1      col2  1:        NA  0.003745  2:  0.000000  0.007463  3: -0.015038 -0.007407  4:  0.003817 -0.003731  5: -0.011407 -0.007491 

I want to create a column called col3 which gives the sum of col1 and col2. If I use

mat[,col3 := col1 + col2]  #        col1      col2      col3 #1:        NA  0.003745        NA #2:  0.000000  0.007463  0.007463 #3: -0.015038 -0.007407 -0.022445 #4:  0.003817 -0.003731  0.000086 #5: -0.011407 -0.007491 -0.018898 

then I get an NA for the first row, but I want NAs to be ignored. So I tried instead

mat[,col3 := sum(col1,col2,na.rm=TRUE)]  #        col1      col2      col3 #1:        NA  0.003745 -0.030049 #2:  0.000000  0.007463 -0.030049 #3: -0.015038 -0.007407 -0.030049 #4:  0.003817 -0.003731 -0.030049 #5: -0.011407 -0.007491 -0.030049 

which is not what I am after, since it is giving me the sum of all elements of col1 and col2. I think I don't quite get :=... How can I get the sum of the element of col1 and col2 ignoring NA values?

Not sure this is relevant, but here is my sessionInfo

> sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)  locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8  attached base packages: [1] stats     graphics  grDevices utils     datasets  methods   base       other attached packages: [1] data.table_1.8.3 
like image 570
Vivi Avatar asked Oct 28 '12 05:10

Vivi


People also ask

How do I sum a column and ignore NA in R?

To find the sum of non-missing values in an R data frame column, we can simply use sum function and set the na. rm to TRUE. For example, if we have a data frame called df that contains a column say x which has some missing values then the sum of the non-missing values can be found by using the command sum(df$x,na.

How do I total two columns in R?

We can calculate the sum of multiple columns by using rowSums() and c() Function. we simply have to pass the name of the columns.

How do I sum a row with NA in R?

To find the row sums if NA exists in the R data frame, we can use rowSums function and set the na. rm argument to TRUE and this argument will remove NA values before calculating the row sums.


1 Answers

This is standard R behaviour, nothing really to do with data.table

Adding anything to NA will return NA

NA + 1 ## NA 

sum will return a single number

If you want 1 + NA to return 1

then you will have to run something like

mat[,col3 := col1 + col2] mat[is.na(col1), col3 := col2] mat[is.na(col2), col3 := col1] 

To deal with when col1 or col2 are NA


EDIT - an easier solution

You could also use rowSums, which has a na.rm argument

mat[ , col3 :=rowSums(.SD, na.rm = TRUE), .SDcols = c("col1", "col2")] 

rowSums is what you want (by definition, the rowSums of a matrix containing col1 and col2, removing NA values

(@JoshuaUlrich suggested this as a comment )

like image 58
mnel Avatar answered Sep 20 '22 17:09

mnel