Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

na.locf using group_by from dplyr

Tags:

r

dplyr

zoo

I'm trying to use na.locf from package zoo with grouped data using dplyr. I'm using the first solution on this question: Using dplyr window-functions to make trailing values (fill in NA values)

library(dplyr);library(zoo)
df1 <- data.frame(id=rep(c("A","B"),each=3),problem=c(1,NA,2,NA,NA,NA),ok=c(NA,3,4,5,6,NA))
df1
  id problem ok
1  A       1 NA
2  A      NA  3
3  A       2  4
4  B      NA  5
5  B      NA  6
6  B      NA NA

The problem happens when, within a group, all the data is NA. As you can see in the problem column, the na.locf data for id=B comes from another group: the last data of id=A.

df1 %>% group_by(id) %>% na.locf()

Source: local data frame [6 x 3]
Groups: id [2]

     id problem    ok
  <chr>   <chr> <chr>
1     A       1  <NA>
2     A       1     3
3     A       2     4
4     B       2     5 #problem col is wrong
5     B       2     6 #problem col is wrong
6     B       2     6 #problem col is wrong

This is my expected result. The data for id=B is independent of what is in id=A

     id problem    ok
  <chr>   <chr> <chr>
1     A       1  <NA>
2     A       1     3
3     A       2     4
4     B       NA     5
5     B       NA     6
6     B       NA     6
like image 536
Pierre Lapointe Avatar asked Apr 04 '17 16:04

Pierre Lapointe


1 Answers

We need to use na.locf within mutate_all as na.locf can be applied directly on the dataset. Eventhough it is grouped by 'id', applying na.locf by applying on the full dataset is not following any group by behavior

df1 %>%
     group_by(id) %>%
     mutate_all(funs(na.locf(., na.rm = FALSE)))
#    id problem    ok
#  <fctr>   <dbl> <dbl>
#1      A       1    NA
#2      A       1     3
#3      A       2     4
#4      B      NA     5
#5      B      NA     6
#6      B      NA     6
like image 94
akrun Avatar answered Nov 06 '22 02:11

akrun