During a simulation I created multiple data sets with > 1,000,000 variables. However, some of the values of these variables are NA
and in some cases even all values are NA
. Now I'd like to calculate the sum of all values of the variables but want to get NA
if all values are NA
.
The problem with the common sum(x, na.rm=T)
or sum(na.omit(x))
is, that it returns 0 if all values are NA
. Thus, I've written my own function that deals with NA
in the expected way:
sumna <- function(x) {
sumna <- NULL
return(ifelse(all(is.na(x)), NA, sum(na.omit(x))))
}
However, that implementation is rather slow.
Thus, I'm looking for an implementation or pre-implemented function that sums up values of a vector, omits NA
and returns NA
if all values are NA
.
Many thanks in advance!
The sum_
from hablar
have the same behavior as the OP wanted. So, no need to reinvent the wheel
library(hablar)
sum_(c(1:10, NA))
#[1] 55
sum_(c(NA, NA, NA))
#[1] NA
and it can be used in tidyverse
or data.table
library(dplyr)
df1 %>%
summarise_all(sum_)
But, if we need to change the OP's custom function, instead of ifelse
, a better option is if/else
sumna <- function(x) {
if(all(is.na(x))) NA else sum(x, na.rm = TRUE)
}
Also, we can use the vectorized colSums
v1 <- colSums(df1, na.rm = TRUE)
v1[colSums(is.na(df1)) == nrow(df1)] <- NA
As the dataset is huge, we can also make use of the efficient data.table
library(data.table)
setDT(df1)[, lapply(.SD, sumna)]
Or using tidyverse
library(tidyverse)
df1 %>%
summarise_all(sumna)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With