The sum function returns 0 if it is applied to an empty set. Is there a simple way to make it return NA if it is applied to a set of NA values?
Here is a borrowed example:
test <- data.frame(name = rep(c("A", "B", "C"), each = 4),
var1 = rep(c(1:3, NA), 3),
var2 = 1:12,
var3 = c(rep(NA, 4), 1:8))
test
name var1 var2 var3
1 A 1 1 NA
2 A 2 2 NA
3 A 3 3 NA
4 A NA 4 NA
5 B 1 5 1
6 B 2 6 2
7 B 3 7 3
8 B NA 8 4
9 C 1 9 5
10 C 2 10 6
11 C 3 11 7
12 C NA 12 8
I would like to have per name the sum of the three variables. Here is what I tried:
var_to_aggr <- c("var1","var2","var3")
aggr_by <- "name"
summed <- aggregate(test[var_to_aggr],by=test[aggr_by],FUN="sum", na.rm = TRUE)
This gives me:
name var1 var2 var3
1 A 6 10 0
2 B 6 26 10
3 C 6 42 26
But I need:
name var1 var2 var3
1 A 6 10 NA
2 B 6 26 10
3 C 6 42 26
The sum for name A, var3 should be NA and not 0. (just to be clear, it should not be NA for name A, var1, where the set contains one NA but also valid values that should be summed up). Any ideas?
I have been fiddling with na.action but sum doesn't seem to accept these.
To find the sum of non-missing values in an R data frame column, we can simply use sum function and set the na. rm to TRUE. For example, if we have a data frame called df that contains a column say x which has some missing values then the sum of the non-missing values can be found by using the command sum(df$x,na.
If you are summing floating-point numbers, you can't have an integer overflow (floats are not integers) Do you have NA s in your data? If you sum anything with NA s present, the result will be NA , unless you handle it properly.
You can try
f1 <- function(x) if(all(is.na(x))) NA_integer_ else sum(x, na.rm=TRUE)
aggregate(.~name, test, FUN=f1, na.action=NULL)
Or
library(dplyr)
test %>%
group_by(name) %>%
summarise_each(funs(f1))
Or
library(data.table)
setDT(test)[, lapply(.SD, f1), name]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With