I am attempting to fill some values in a data set. A simplified version of my data can be found below:
> example_df
Date GROUP value
157 2018-01-31 10180 3.464
158 2018-02-28 10180 3.413
159 2018-03-31 10180 3.418
160 2018-04-30 10180 NA
161 2018-05-31 10180 NA
162 2018-06-30 10180 NA
163 2018-07-31 10180 NA
164 2018-08-31 10180 NA
165 2018-09-30 10180 NA
166 2018-10-31 10180 NA
167 2018-11-30 10180 NA
168 2018-12-31 10180 NA
169 2019-01-31 10180 NA
170 2019-02-28 10180 NA
171 2019-03-31 10180 NA
172 2019-04-30 10180 NA
173 2019-05-31 10180 NA
174 2019-06-30 10180 NA
175 2019-07-31 10180 NA
176 2019-08-31 10180 NA
177 2019-09-30 10180 NA
178 2019-10-31 10180 NA
179 2019-11-30 10180 NA
373 2018-01-31 10420 5.085
374 2018-02-28 10420 5.051
375 2018-03-31 10420 4.993
376 2018-04-30 10420 NA
377 2018-05-31 10420 NA
378 2018-06-30 10420 NA
379 2018-07-31 10420 NA
380 2018-08-31 10420 NA
381 2018-09-30 10420 NA
382 2018-10-31 10420 NA
383 2018-11-30 10420 NA
384 2018-12-31 10420 NA
385 2019-01-31 10420 NA
386 2019-02-28 10420 NA
387 2019-03-31 10420 NA
388 2019-04-30 10420 NA
389 2019-05-31 10420 NA
390 2019-06-30 10420 NA
391 2019-07-31 10420 NA
392 2019-08-31 10420 NA
393 2019-09-30 10420 NA
394 2019-10-31 10420 NA
395 2019-11-30 10420 NA
589 2018-01-31 10500 5.796
590 2018-02-28 10500 5.860
591 2018-03-31 10500 5.913
592 2018-04-30 10500 NA
593 2018-05-31 10500 NA
594 2018-06-30 10500 NA
595 2018-07-31 10500 NA
596 2018-08-31 10500 NA
597 2018-09-30 10500 NA
598 2018-10-31 10500 NA
599 2018-11-30 10500 NA
600 2018-12-31 10500 NA
601 2019-01-31 10500 NA
602 2019-02-28 10500 NA
603 2019-03-31 10500 NA
604 2019-04-30 10500 NA
605 2019-05-31 10500 NA
606 2019-06-30 10500 NA
607 2019-07-31 10500 NA
608 2019-08-31 10500 NA
609 2019-09-30 10500 NA
610 2019-10-31 10500 NA
611 2019-11-30 10500 NA
As you can see. For each group I have values up to today's month, and then a set of NAs until the start of the next group. What I would like to do would be, for each group, fill these NAs with a sequence which originates from the last non-NULL value and increases by a fixed value (I have chosen 0.065) until the final date of the group. I would prefer a dplyr solution, but any information on how to achieve this would be very helpful. Thank you.
library(data.table)
dt = as.data.table(yourdf) # or convert in place using setDT
dt[, value := value[1] + 0.065 * (1:.N - 1)
, by = .(GROUP, cumsum(!is.na(value)))]
You could do something like this (inspired by Frank and eddi's comments/solutions):
df$value2 <- ave(df$value, df$GROUP, cumsum(!is.na(df$value)),
FUN = function(x) x[1] + 0.065 * (1:length(x) - 1))
Or my original ave
:
df$value2 <- ave(df$value, df$GROUP,
FUN = function(x) {nas_to_replace <- is.na(x) & seq_along(x) > tail(which(!is.na(x)),1)
replace(x, nas_to_replace,
tail(x[!is.na(x)],1) + 0.065*(1:sum(nas_to_replace)))} )
This function is designed to replace only the NAs that come after the last non-NA. So if you have a vector like c(NA, 1, 2, NA, NA)
it would replace only the last two elements.
head(df)
# Date GROUP value value2
#1 2018-01-31 10180 3.464 3.464
#2 2018-02-28 10180 3.413 3.413
#3 2018-03-31 10180 3.418 3.418
#4 2018-04-30 10180 NA 3.483
#5 2018-05-31 10180 NA 3.548
#6 2018-06-30 10180 NA 3.613
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With