I have daily observations with lots of missing values and am trying to propagate the first non-missing value through a vector for each individual.
In the searching that I have done so far, I discovered the na.locf
function in the zoo
package; however, I now need to condition this function based on the id
variable in my data frame. Is ddply
the right function for this? If so, can someone help me please figure out how to get the output to be included in a new variable called result
in the same data frame?
This is what I have so far:
# Load required libraries
library(zoo)
library(plyr)
# Create the data
data <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 2, 2), day = c(0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 7,
8), value = c("NA", "1", "NA", "NA", "NA", "NA", "NA", "NA",
"NA", "NA", "1", "NA", "NA", "NA", "NA", "NA")), .Names = c("id",
"day", "value"), row.names = c(NA, -16L), class = "data.frame")
# Propagate the value of the first non-missing observation in data$value forward for each id
data$result <- na.locf(data$value, na.rm = FALSE)
Any thoughts on how to run the the na.locf
function by each id
would be greatly appreciated. Thanks!
1) Firstly note that the value
column is a character column with "NA"
values, not NA
values so lets fix that first in ##. Then create a wrapper function na.locf.na
which uses na.locf
in the zoo package and is the same except it defaults to na.rm = FALSE
. Finally use ave
to apply na.locf
by id
:
library(zoo)
data2 <- transform(data, value = as.numeric(value)) ##
na.locf.na <- function(x, na.rm = FALSE, ...) na.locf(x, na.rm = na.rm, ...)
transform(data2, value = ave(value, id, FUN = na.locf.na))
2) or this compact alternative using fn from the gsubfn package to represent na.locf.na
inline in a more compact manner:
library(zoo)
library(gsubfn)
transform(data2, value = fn$ave(value, id, FUN = ~ na.locf(x, na.rm = FALSE)))
In either of these two cases the result is:
id day value
1 1 0 NA
2 1 1 1
3 1 2 1
4 1 3 1
5 1 4 1
6 1 5 1
7 1 6 1
8 2 0 NA
9 2 1 NA
10 2 2 NA
11 2 3 1
12 2 4 1
13 2 5 1
14 2 6 1
15 2 7 1
16 2 8 1
3) We could alternately use dplyr together with zoo using na.locf.na
from above:
library(zoo)
library(dplyr)
data2 <- data %>% mutate(value = as.numeric(value)) # fix value column
data2 %>% group_by(id) %>% mutate(value = na.locf.na(value))
If the dplyr from CRAN does not work here try the one from github:
library(devtools)
install_github("hadley/dplyr")
REVISIONS Reorganized presentation and added alternatives.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With