Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Treat NA as zero only when adding a number

Tags:

r

na

data.table

When calculating the sum of two data tables, NA+n=NA.

> dt1 <- data.table(Name=c("Joe","Ann"), "1"=c(0,NA), "2"=c(3,NA))
> dt1
   Name  1  2
1:  Joe  0  3
2:  Ann NA NA
> dt2 <- data.table(Name=c("Joe","Ann"), "1"=c(0,NA), "2"=c(2,3))
> dt2
   Name  1 2
1:  Joe  0 2
2:  Ann NA 3
> dtsum  <- rbind(dt1, dt2)[, lapply(.SD, sum), by=Name]
> dtsum
   Name  1  2
1:  Joe  0  5
2:  Ann NA NA

I don't want to substitute all NA's with 0. What I want is NA+NA=NA and NA+n=n to get the following result:

   Name  1  2
1:  Joe  0  5
2:  Ann NA  3

How is this done in R?

UPDATE: removed typo in dt1

like image 774
R-obert Avatar asked Feb 24 '13 23:02

R-obert


People also ask

How do you sum variables with NA in R?

To find the sum of non-missing values in an R data frame column, we can simply use sum function and set the na. rm to TRUE. For example, if we have a data frame called df that contains a column say x which has some missing values then the sum of the non-missing values can be found by using the command sum(df$x,na.

How do you add columns to NA in R?

How do I add an empty column to a DataFrame in R? The easiest way to add an empty column to a dataframe in R is to use the add_column() method: dataf %>% add_column(new_col = NA) . Note, that this includes installing dplyr or tidyverse.


2 Answers

You can define your own function to act as you want

plus <- function(x) {
 if(all(is.na(x))){
   c(x[0],NA)} else {
   sum(x,na.rm = TRUE)}
 }


rbind(dt1, dt2)[,lapply(.SD, plus), by = Name]
like image 81
mnel Avatar answered Nov 16 '22 00:11

mnel


dtsum  <- rbind(dt1, dt2)[, lapply(.SD, function(x) ifelse(all(is.na(x)), as.numeric(NA), sum(x, na.rm=T))), by=Name]

(includes @Arun's suggestion) na.rm=TRUE is very useful to remember

like image 42
alexwhan Avatar answered Nov 16 '22 01:11

alexwhan