Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NA filling only if "sandwiched" by the same value using dplyr

Tags:

r

na

dplyr

Ok, here is yet another missing value filling question.

I am looking for a way to fill NAs based on both the previous and next existent values in a column. Standard filling in a single direction is not sufficient for this task.

If the previous and next valid values in a column are not the same, then the chunk remains as NA.

enter image description here

The code for the sample data frame is:

df_in <- tibble(id= 1:12,
        var1 = letters[1:12],
        var2 = c(NA,rep("A",2),rep(NA,2),rep("A",2),rep(NA,2),rep("B",2),NA))

Thanks,

like image 300
Mario Reyes Avatar asked Aug 06 '18 09:08

Mario Reyes


2 Answers

Comparing na.locf() (last observation carried forward) and na.locf(fromLast = TRUE) (backward):

mutate(df_in, 
       var_new = if_else(
         zoo::na.locf(var2, na.rm = FALSE) == 
           zoo::na.locf(var2, na.rm = FALSE, fromLast = TRUE),
         zoo::na.locf(var2, na.rm = FALSE),
         NA_character_
       ))

# # A tibble: 12 x 4
#       id var1  var2  var_new
#    <int> <chr> <chr> <chr>  
#  1     1 a     NA    NA     
#  2     2 b     A     A      
#  3     3 c     A     A      
#  4     4 d     NA    A      
#  5     5 e     NA    A      
#  6     6 f     A     A      
#  7     7 g     A     A      
#  8     8 h     NA    NA     
#  9     9 i     NA    NA     
# 10    10 j     B     B      
# 11    11 k     B     B      
# 12    12 l     NA    NA 
like image 51
Aurèle Avatar answered Oct 31 '22 17:10

Aurèle


Something like this?

df_in %>% mutate(var_new = {
       tmp <- var2
       tmp[is.na(tmp)] <- "NA"
       rl <- rle(tmp)
       tibble(before = c(NA, head(rl$values, -1)),
              value  = rl$values,
              after  = c(tail(rl$values, -1), NA),
              lengths = rl$lengths) %>%
       mutate(value = ifelse(value == "NA" & before == after, before, value),
              value = ifelse(value == "NA", NA, value)) %>%
       select(value, lengths) %>%
       unname() %>%
       do.call(rep, .)})

# # A tibble: 12 x 4
#       id var1  var2  var_new
#    <int> <chr> <chr> <chr>  
#  1     1 a     NA    <NA>   
#  2     2 b     A     A      
#  3     3 c     A     A      
#  4     4 d     NA    A      
#  5     5 e     NA    A      
#  6     6 f     A     A      
#  7     7 g     A     A      
#  8     8 h     NA    <NA>   
#  9     9 i     NA    <NA>   
# 10    10 j     B     B      
# 11    11 k     B     B      
# 12    12 l     NA    <NA>

Explanation

  1. Convert NA to "NA" (because rle does not count consecutive NA.)
  2. Create a run length encoded representation of tmp
  3. Now you cna have a look at values beofre and after the relevant blocks
  4. Replace the values.
like image 25
thothal Avatar answered Oct 31 '22 19:10

thothal