I looking for a python function like fillna(method='bfill', limit=30) but inside R.
I have this data frame.
DATE ELE.CN
<dttm> <dbl>
1 2009-06-30 00:00:00 115942928608
2 2009-06-28 00:00:00 115942928608
3 2009-06-27 00:00:00 115942928608
4 2009-06-26 00:00:00 115942928608
5 2009-06-24 00:00:00 NA
6 2009-06-23 00:00:00 NA
7 2009-06-21 00:00:00 NA
8 2009-06-20 00:00:00 NA
9 2009-06-19 00:00:00 NA
10 2009-06-17 00:00:00 NA
...
The idea I have is to fill a few NAS with a limit of 30. I have searched, but have not found anything similar.
Thank you.
One potential solution is to use vec_fill_missing() from the vctrs package which has a "max_fill" option:
library(tidyverse)
library(vctrs)
df <- read.table(text = "DATE ELE.CN
2009-06-30 00:00:00 115942928608
2009-06-28 00:00:00 115942928608
2009-06-27 00:00:00 115942928608
2009-06-26 00:00:00 115942928608
2009-06-24 00:00:00 NA
2009-06-23 00:00:00 NA
2009-06-21 00:00:00 NA
2009-06-20 00:00:00 NA
2009-06-19 00:00:00 NA
2009-06-17 00:00:00 NA", header = TRUE)
df
#> DATE ELE.CN
#> 2009-06-30 00:00:00 115942928608
#> 2009-06-28 00:00:00 115942928608
#> 2009-06-27 00:00:00 115942928608
#> 2009-06-26 00:00:00 115942928608
#> 2009-06-24 00:00:00 NA
#> 2009-06-23 00:00:00 NA
#> 2009-06-21 00:00:00 NA
#> 2009-06-20 00:00:00 NA
#> 2009-06-19 00:00:00 NA
#> 2009-06-17 00:00:00 NA
df %>%
mutate(ELE.CN = vec_fill_missing(ELE.CN, max_fill = 3))
#> DATE ELE.CN
#> 2009-06-30 00:00:00 115942928608
#> 2009-06-28 00:00:00 115942928608
#> 2009-06-27 00:00:00 115942928608
#> 2009-06-26 00:00:00 115942928608
#> 2009-06-24 00:00:00 115942928608
#> 2009-06-23 00:00:00 115942928608
#> 2009-06-21 00:00:00 115942928608
#> 2009-06-20 00:00:00 NA
#> 2009-06-19 00:00:00 NA
#> 2009-06-17 00:00:00 NA
Created on 2022-07-14 by the reprex package (v2.0.1)
Here is another suggestion, that will work on your example but could fail for a generalized approach (or must be tweaked):
library(dplyr)
df %>%
group_by(group_id = cumsum(is.na(ELE.CN))) %>%
ungroup() %>%
mutate(ELE.CN = ifelse(is.na(ELE.CN) &
(group_id >= 0 & group_id <=30),
first(ELE.CN), ELE.CN), .keep="unused")
DATE ELE.CN
<chr> <dbl>
1 2009-06-30 00:00:00 115942928608
2 2009-06-28 00:00:00 115942928608
3 2009-06-27 00:00:00 115942928608
4 2009-06-26 00:00:00 115942928608
5 2009-06-24 00:00:00 115942928608
6 2009-06-23 00:00:00 115942928608
7 2009-06-21 00:00:00 115942928608
8 2009-06-20 00:00:00 115942928608
9 2009-06-19 00:00:00 115942928608
10 2009-06-17 00:00:00 115942928608
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With