Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace NA with previous or next value, by group, using dplyr

I have a data frame which is arranged by descending order of date.

ps1 = data.frame(userID = c(21,21,21,22,22,22,23,23,23),               color = c(NA,'blue','red','blue',NA,NA,'red',NA,'gold'),               age = c('3yrs','2yrs',NA,NA,'3yrs',NA,NA,'4yrs',NA),               gender = c('F',NA,'M',NA,NA,'F','F',NA,'F')  ) 

I wish to impute(replace) NA values with previous values and grouped by userID In case the first row of a userID has NA then replace with the next set of values for that userid group.

I am trying to use dplyr and zoo packages something like this...but its not working

cleanedFUG <- filteredUserGroup %>%  group_by(UserID) %>%  mutate(Age1 = na.locf(Age),       Color1 = na.locf(Color),       Gender1 = na.locf(Gender) )  

I need result df like this:

                      userID color  age gender                 1     21  blue 3yrs      F                 2     21  blue 2yrs      F                 3     21   red 2yrs      M                 4     22  blue 3yrs      F                 5     22  blue 3yrs      F                 6     22  blue 3yrs      F                 7     23   red 4yrs      F                 8     23   red 4yrs      F                 9     23  gold 4yrs      F 
like image 814
Tarak Avatar asked Oct 14 '16 10:10

Tarak


People also ask

How do I replace my Dplyr na?

You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.

How do I replace missing values in NA with R?

So, how do you replace missing values with basic R code? To replace the missing values, you first identify the NA's with the is.na() function and the $-operator. Then, you use the min() function to replace the NA's with the lowest value.

Which function is used for filling NA value with consecutive values in R?

The fillna() function is used to fill NA/NaN values using the specified method.


2 Answers

require(tidyverse) #fill is part of tidyr  ps1 %>%    group_by(userID) %>%    fill(color, age, gender) %>% #default direction down   fill(color, age, gender, .direction = "up") 

Which gives you:

Source: local data frame [9 x 4] Groups: userID [3]    userID  color    age gender    <dbl> <fctr> <fctr> <fctr> 1     21   blue   3yrs      F 2     21   blue   2yrs      F 3     21    red   2yrs      M 4     22   blue   3yrs      F 5     22   blue   3yrs      F 6     22   blue   3yrs      F 7     23    red   4yrs      F 8     23    red   4yrs      F 9     23   gold   4yrs      F 
like image 199
Rentrop Avatar answered Oct 13 '22 18:10

Rentrop


Using zoo::na.locf directly on the whole data.frame would fill the NA regardless of the userID groups. Package dplyr's grouping has unfortunately no effect on na.locf function, that's why I went with a split:

library(dplyr); library(zoo) ps1 %>% split(ps1$userID) %>%    lapply(function(x) {na.locf(na.locf(x), fromLast=T)}) %>%    do.call(rbind, .) ####      userID color  age gender #### 21.1     21  blue 3yrs      F #### 21.2     21  blue 2yrs      F #### 21.3     21   red 2yrs      M #### 22.4     22  blue 3yrs      F #### 22.5     22  blue 3yrs      F #### 22.6     22  blue 3yrs      F #### 23.7     23   red 4yrs      F #### 23.8     23   red 4yrs      F #### 23.9     23  gold 4yrs      F 

What it does is that it first splits the data into 3 data.frames, then I apply a first pass of imputation (downwards), then upwards with the anonymous function in lapply, and eventually use rbind to bring the data.frames back together. You have the expected output.

like image 43
agenis Avatar answered Oct 13 '22 16:10

agenis