Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Carry Last Observation Forward by ID in R

Tags:

r

na

zoo

I have daily observations with lots of missing values and am trying to propagate the first non-missing value through a vector for each individual.

In the searching that I have done so far, I discovered the na.locf function in the zoo package; however, I now need to condition this function based on the id variable in my data frame. Is ddply the right function for this? If so, can someone help me please figure out how to get the output to be included in a new variable called result in the same data frame?

This is what I have so far:

# Load required libraries
library(zoo)
library(plyr)

# Create the data
data <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
              2, 2, 2), day = c(0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 7, 
              8), value = c("NA", "1", "NA", "NA", "NA", "NA", "NA", "NA", 
              "NA", "NA", "1", "NA", "NA", "NA", "NA", "NA")), .Names = c("id", 
              "day", "value"), row.names = c(NA, -16L), class = "data.frame")

# Propagate the value of the first non-missing observation in data$value forward for each id
data$result <- na.locf(data$value, na.rm = FALSE)

Any thoughts on how to run the the na.locf function by each id would be greatly appreciated. Thanks!

like image 935
Entropy Avatar asked Mar 19 '23 13:03

Entropy


1 Answers

1) Firstly note that the value column is a character column with "NA" values, not NA values so lets fix that first in ##. Then create a wrapper function na.locf.na which uses na.locf in the zoo package and is the same except it defaults to na.rm = FALSE. Finally use ave to apply na.locf by id:

library(zoo)

data2 <- transform(data, value = as.numeric(value)) ##

na.locf.na <- function(x, na.rm = FALSE, ...) na.locf(x, na.rm = na.rm, ...)
transform(data2, value = ave(value, id, FUN = na.locf.na))

2) or this compact alternative using fn from the gsubfn package to represent na.locf.na inline in a more compact manner:

library(zoo)
library(gsubfn)

transform(data2, value = fn$ave(value, id, FUN = ~ na.locf(x, na.rm = FALSE)))

In either of these two cases the result is:

   id day value
1   1   0    NA
2   1   1     1
3   1   2     1
4   1   3     1
5   1   4     1
6   1   5     1
7   1   6     1
8   2   0    NA
9   2   1    NA
10  2   2    NA
11  2   3     1
12  2   4     1
13  2   5     1
14  2   6     1
15  2   7     1
16  2   8     1

3) We could alternately use dplyr together with zoo using na.locf.na from above:

library(zoo)
library(dplyr)

data2 <- data %>% mutate(value = as.numeric(value)) # fix value column
data2 %>% group_by(id) %>% mutate(value = na.locf.na(value))

If the dplyr from CRAN does not work here try the one from github:

library(devtools)
install_github("hadley/dplyr")

REVISIONS Reorganized presentation and added alternatives.

like image 130
G. Grothendieck Avatar answered Apr 20 '23 06:04

G. Grothendieck