I have a data frame with a large number of variables. I am creating new variables by adding together some of the old ones. The code I am using to do so is:
name_of_data_frame<- transform(name_of_data_frame, new_variable=var1+var2 +....)
When transform comes across a NA in one of the observations, it returns "NA" in the new variable, even if some of the other variables it was adding were not NA.
e.g. if var1= 4
, var2=3
, var3=NA
, then using transform
, if I did var1+var2+var3
it would give out NA
, whereas I would like it to give me 7.
I don't want to recode my NA
s to zero within the data frame, as I may need to refer back to the NA
s later, so don't want to confuse the NA
s with the observations which were genuinely 0
.
Any help on how to get around R treating NA
s in the way described above with the transform function would be great (or if there are alternative functions to use, that would be great also).
Please note that I am not always just summing variables that are next to each other, I am also often dividing variables, multiplying, subtracting etc.
My first instinct was to suggest to use sum()
since then you can use the na.rm
argument. However, this doesn't work, since sum()
reduces it arguments to a single scalar value, not a vector.
This means you need to write a parallel sum
function. Let's call this psum()
, similar to the base R function pmin()
or pmax()
:
psum <- function(..., na.rm=FALSE) {
x <- list(...)
rowSums(matrix(unlist(x), ncol=length(x)), na.rm=na.rm)
}
Now set up some data and use psum()
to get the desired vector:
dat <- data.frame(
x = c(1,2,3, NA),
y = c(NA, 4, 5, NA))
transform(dat, new=psum(x, y, na.rm=TRUE))
x y new
1 1 NA 1
2 2 4 6
3 3 5 8
4 NA NA 0
Similarly, you can define a parallel product
, or pprod()
like this:
pprod <- function(..., na.rm=FALSE) {
x <- list(...)
m <- matrix(unlist(x), ncol=length(x))
apply(m, 1, prod, na.rm=TRUE)
}
transform(dat, new=pprod(x, y, na.rm=TRUE))
x y new
1 1 NA 1
2 2 4 8
3 3 5 15
4 NA NA 1
This example of pprod
provides a general template for what you want to do: Create a function that uses apply()
to summarize a matrix of input into the desired vector.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With