Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fill down values with limit in R? [duplicate]

Tags:

r

I looking for a python function like fillna(method='bfill', limit=30) but inside R.

I have this data frame.

DATE                      ELE.CN
   <dttm>                     <dbl>
 1 2009-06-30 00:00:00 115942928608
 2 2009-06-28 00:00:00 115942928608
 3 2009-06-27 00:00:00 115942928608
 4 2009-06-26 00:00:00 115942928608
 5 2009-06-24 00:00:00           NA
 6 2009-06-23 00:00:00           NA
 7 2009-06-21 00:00:00           NA
 8 2009-06-20 00:00:00           NA
 9 2009-06-19 00:00:00           NA
10 2009-06-17 00:00:00           NA
...

The idea I have is to fill a few NAS with a limit of 30. I have searched, but have not found anything similar.

Thank you.

like image 343
Mick Avatar asked Nov 30 '25 02:11

Mick


2 Answers

One potential solution is to use vec_fill_missing() from the vctrs package which has a "max_fill" option:

library(tidyverse)
library(vctrs)

df <- read.table(text = "DATE                      ELE.CN
 2009-06-30 00:00:00 115942928608
 2009-06-28 00:00:00 115942928608
 2009-06-27 00:00:00 115942928608
 2009-06-26 00:00:00 115942928608
 2009-06-24 00:00:00           NA
 2009-06-23 00:00:00           NA
 2009-06-21 00:00:00           NA
 2009-06-20 00:00:00           NA
 2009-06-19 00:00:00           NA
 2009-06-17 00:00:00           NA", header = TRUE)
df
#>                DATE       ELE.CN
#> 2009-06-30 00:00:00 115942928608
#> 2009-06-28 00:00:00 115942928608
#> 2009-06-27 00:00:00 115942928608
#> 2009-06-26 00:00:00 115942928608
#> 2009-06-24 00:00:00           NA
#> 2009-06-23 00:00:00           NA
#> 2009-06-21 00:00:00           NA
#> 2009-06-20 00:00:00           NA
#> 2009-06-19 00:00:00           NA
#> 2009-06-17 00:00:00           NA

df %>%
  mutate(ELE.CN = vec_fill_missing(ELE.CN, max_fill = 3))
#>                DATE       ELE.CN
#> 2009-06-30 00:00:00 115942928608
#> 2009-06-28 00:00:00 115942928608
#> 2009-06-27 00:00:00 115942928608
#> 2009-06-26 00:00:00 115942928608
#> 2009-06-24 00:00:00 115942928608
#> 2009-06-23 00:00:00 115942928608
#> 2009-06-21 00:00:00 115942928608
#> 2009-06-20 00:00:00           NA
#> 2009-06-19 00:00:00           NA
#> 2009-06-17 00:00:00           NA

Created on 2022-07-14 by the reprex package (v2.0.1)

like image 174
jared_mamrot Avatar answered Dec 01 '25 15:12

jared_mamrot


Here is another suggestion, that will work on your example but could fail for a generalized approach (or must be tweaked):

library(dplyr)

df %>% 
  group_by(group_id = cumsum(is.na(ELE.CN))) %>% 
  ungroup() %>% 
  mutate(ELE.CN = ifelse(is.na(ELE.CN) & 
                           (group_id >= 0 & group_id <=30), 
                         first(ELE.CN), ELE.CN), .keep="unused") 
   DATE                      ELE.CN
   <chr>                      <dbl>
 1 2009-06-30 00:00:00 115942928608
 2 2009-06-28 00:00:00 115942928608
 3 2009-06-27 00:00:00 115942928608
 4 2009-06-26 00:00:00 115942928608
 5 2009-06-24 00:00:00 115942928608
 6 2009-06-23 00:00:00 115942928608
 7 2009-06-21 00:00:00 115942928608
 8 2009-06-20 00:00:00 115942928608
 9 2009-06-19 00:00:00 115942928608
10 2009-06-17 00:00:00 115942928608
like image 35
TarJae Avatar answered Dec 01 '25 15:12

TarJae