I have a data frame which is arranged by descending order of date. <pre class="prettyprint"><code>ps1 = data.frame(userID = c(21,21,21,22,22,22,23,23,23), color = c(NA,'blue','red','blue',NA,NA,'red',NA,'gold'), age = c('3yrs','2yrs',NA,NA,'3yrs',NA,NA,'4yrs',NA), gender = c('F',NA,'M',NA,NA,'F','F',NA,'F') ) </code></pre> I wish to impute(replace) NA values with previous values and grouped by userID In case the first row of a userID has NA then replace with the next set of values for that userid group. I am trying to use dplyr and zoo packages something like this...but its not working <pre class="prettyprint"><code>cleanedFUG <- filteredUserGroup %>% group_by(UserID) %>% mutate(Age1 = na.locf(Age), Color1 = na.locf(Color), Gender1 = na.locf(Gender) ) </code></pre> I need result df like this: <pre class="prettyprint"><code> userID color age gender 1 21 blue 3yrs F 2 21 blue 2yrs F 3 21 red 2yrs M 4 22 blue 3yrs F 5 22 blue 3yrs F 6 22 blue 3yrs F 7 23 red 4yrs F 8 23 red 4yrs F 9 23 gold 4yrs F </code></pre>

<pre class="prettyprint"><code>require(tidyverse) #fill is part of tidyr ps1 %>% group_by(userID) %>% fill(color, age, gender) %>% #default direction down fill(color, age, gender, .direction = "up") </code></pre> Which gives you: <pre class="prettyprint"><code>Source: local data frame [9 x 4] Groups: userID [3] userID color age gender <dbl> <fctr> <fctr> <fctr> 1 21 blue 3yrs F 2 21 blue 2yrs F 3 21 red 2yrs M 4 22 blue 3yrs F 5 22 blue 3yrs F 6 22 blue 3yrs F 7 23 red 4yrs F 8 23 red 4yrs F 9 23 gold 4yrs F </code></pre>

Replace NA with previous or next value, by group, using dplyr

Tags:

r

missing-data

dplyr

zoo

I have a data frame which is arranged by descending order of date.

ps1 = data.frame(userID = c(21,21,21,22,22,22,23,23,23),               color = c(NA,'blue','red','blue',NA,NA,'red',NA,'gold'),               age = c('3yrs','2yrs',NA,NA,'3yrs',NA,NA,'4yrs',NA),               gender = c('F',NA,'M',NA,NA,'F','F',NA,'F')  )

I wish to impute(replace) NA values with previous values and grouped by userID In case the first row of a userID has NA then replace with the next set of values for that userid group.

I am trying to use dplyr and zoo packages something like this...but its not working

cleanedFUG <- filteredUserGroup %>%  group_by(UserID) %>%  mutate(Age1 = na.locf(Age),       Color1 = na.locf(Color),       Gender1 = na.locf(Gender) )

I need result df like this:

                      userID color  age gender                 1     21  blue 3yrs      F                 2     21  blue 2yrs      F                 3     21   red 2yrs      M                 4     22  blue 3yrs      F                 5     22  blue 3yrs      F                 6     22  blue 3yrs      F                 7     23   red 4yrs      F                 8     23   red 4yrs      F                 9     23  gold 4yrs      F

814

asked Oct 14 '16 10:10

Tarak

2 Answers

require(tidyverse) #fill is part of tidyr  ps1 %>%    group_by(userID) %>%    fill(color, age, gender) %>% #default direction down   fill(color, age, gender, .direction = "up")

Which gives you:

Source: local data frame [9 x 4] Groups: userID [3]    userID  color    age gender    <dbl> <fctr> <fctr> <fctr> 1     21   blue   3yrs      F 2     21   blue   2yrs      F 3     21    red   2yrs      M 4     22   blue   3yrs      F 5     22   blue   3yrs      F 6     22   blue   3yrs      F 7     23    red   4yrs      F 8     23    red   4yrs      F 9     23   gold   4yrs      F

199

answered Oct 13 '22 18:10

Rentrop

Using zoo::na.locf directly on the whole data.frame would fill the NA regardless of the userID groups. Package dplyr's grouping has unfortunately no effect on na.locf function, that's why I went with a split:

library(dplyr); library(zoo) ps1 %>% split(ps1$userID) %>%    lapply(function(x) {na.locf(na.locf(x), fromLast=T)}) %>%    do.call(rbind, .) ####      userID color  age gender #### 21.1     21  blue 3yrs      F #### 21.2     21  blue 2yrs      F #### 21.3     21   red 2yrs      M #### 22.4     22  blue 3yrs      F #### 22.5     22  blue 3yrs      F #### 22.6     22  blue 3yrs      F #### 23.7     23   red 4yrs      F #### 23.8     23   red 4yrs      F #### 23.9     23  gold 4yrs      F

What it does is that it first splits the data into 3 data.frames, then I apply a first pass of imputation (downwards), then upwards with the anonymous function in lapply, and eventually use rbind to bring the data.frames back together. You have the expected output.

answered Oct 13 '22 16:10

agenis

Related questions
                            
                                ggplot2 - The unit of size
                            
                                Why (or when) is Rscript (or littler) better than R CMD BATCH?
                            
                                Where should I put data for automated tests with testthat?
                            
                                Growing a data.frame in a memory-efficient manner
                            
                                How to remove a level of lists from a list of lists
                            
                                Can't load X11 in R after OS X Yosemite upgrade
                            
                                Don't drop zero count: dodged barplot
                            
                                How to add rows to empty data frames with header in R? [duplicate]
                            
                                What can R do about a messy data format?
                            
                                cbind a dataframe with an empty dataframe - cbind.fill?
                            
                                Fastest way to multiply matrix columns with vector elements in R
                            
                                Plot multiple columns on the same graph in R [duplicate]
                            
                                Difference between paste() and paste0()
                            
                                What is the pandas equivalent of dplyr summarize/aggregate by multiple functions?
                            
                                What is the width argument in position_dodge?
                            
                                How to sum a numeric list elements
                            
                                How to rotate only text in annotation in ggplot?
                            
                                Rcpp package doesn't include Rcpp_precious_remove
                            
                                change both legend titles in a ggplot with two legends
                            
                                Extract random effect variances from lme4 mer model object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With