I'm trying to use na.locf
from package zoo
with grouped data using dplyr
. I'm using the first solution on this question: Using dplyr window-functions to make trailing values (fill in NA values)
library(dplyr);library(zoo)
df1 <- data.frame(id=rep(c("A","B"),each=3),problem=c(1,NA,2,NA,NA,NA),ok=c(NA,3,4,5,6,NA))
df1
id problem ok
1 A 1 NA
2 A NA 3
3 A 2 4
4 B NA 5
5 B NA 6
6 B NA NA
The problem happens when, within a group, all the data is NA. As you can see in the problem column, the na.locf
data for id=B comes from another group: the last data of id=A.
df1 %>% group_by(id) %>% na.locf()
Source: local data frame [6 x 3]
Groups: id [2]
id problem ok
<chr> <chr> <chr>
1 A 1 <NA>
2 A 1 3
3 A 2 4
4 B 2 5 #problem col is wrong
5 B 2 6 #problem col is wrong
6 B 2 6 #problem col is wrong
This is my expected result. The data for id=B is independent of what is in id=A
id problem ok
<chr> <chr> <chr>
1 A 1 <NA>
2 A 1 3
3 A 2 4
4 B NA 5
5 B NA 6
6 B NA 6
We need to use na.locf
within mutate_all
as na.locf
can be applied directly on the dataset. Eventhough it is grouped by 'id', applying na.locf
by applying on the full dataset is not following any group by behavior
df1 %>%
group_by(id) %>%
mutate_all(funs(na.locf(., na.rm = FALSE)))
# id problem ok
# <fctr> <dbl> <dbl>
#1 A 1 NA
#2 A 1 3
#3 A 2 4
#4 B NA 5
#5 B NA 6
#6 B NA 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With