First question here after some searching and scrolling I'm still stuck
I have a big vector that should always be increasing but it sometimes reset to 0. I'd like that everytime it resets to 0 the previous non-0 value gets added to the following values. I've tried LOCF but it doesn't work as it only fills my 0 values with the previous values and then goes back to the lowest value.
Vector example:
Data | Desired transformation |
---|---|
0 | 0 |
0 | 0 |
1 | 1 |
2 | 2 |
3 | 3 |
5 | 5 |
6 | 6 |
0 | 6 |
0 | 6 |
1 | 7 |
2 | 8 |
Last observation carried forward (LOCF) is a method of imputing missing data in longitudinal studies. If a person drops out of a study before it ends, then his or her last observed score on the dependent variable is used for all subsequent (i.e., missing) observation points.
The ultimate aim of LOCF lies in enhancing the quality of higher education in India and encouraging the students to gain the best skills & knowledge during their student journey. Learning outcomes are determined in sync with what students are expected to understand at the end of their study program.
The last observation carried forward (LOCF) method is a common way for imputing data with dropouts in clinical trial study. The last non-missing observed value is used to fill in missing values at a later time point.
Note: Last observation carried forward (LOCF) is a method of imputing missing data in longitudinal studies. If a person drops out of a study before it ends, then his or her last observed score on the dependent variable is used for all subsequent (i.e., missing) observation points.
Adding the new tidyr::fill () function for carrying forward the last observation in a column to fill in NA s: Show activity on this post. There are a bunch of packages implementing exactly this functionality. (with same basic functionality, but some differences in additional options)
The analysis should be applied to the last observation carried forward on treatment in the modified ITT population defined as subjects who received at least one dose of study drug and have at least one post-baseline assessment of body weight.
While we acknowledge that the LOCF is not a perfect approach, the LOCF approach should not be totally abandoned. In some situations, the LOCF approach is commonly agreed to be a more conservative approach and may be appropriate to be used. Only under certain restrictive assumptions does LOCF produce an unbiased estimate of the treatment effect.
Perhaps you can try cumsum
+ rle
like below
v <- df$Data
idx <- with(
rle(v == 0),
cumsum(lengths)[values] - 1
)
df$DataOut <- v + cumsum(replace(rep(0, length(v)), idx, v[pmax(1, idx - 1)]))
which gives
> df
# A tibble: 11 x 2
Data DataOut
<dbl> <dbl>
1 0 0
2 0 0
3 1 1
4 2 2
5 3 3
6 5 5
7 6 6
8 0 6
9 0 6
10 1 7
11 2 8
I think this will also do ( I haven't removed dummy column d
for better understanding that what's actually happening here)
df %>% mutate(d = c(0, diff(Data)),
out = cumsum(pmax(-1 *Data, d)))
Data d out
<dbl> <dbl> <dbl>
1 0 0 0
2 0 0 0
3 1 1 1
4 2 1 2
5 3 1 3
6 5 2 5
7 6 1 6
8 0 -6 6
9 0 0 6
10 1 1 7
11 2 1 8
Once you understand, you can simply do
df %>% mutate(out = cumsum(pmax(-1 *Data, c(0, diff(Data)))))
# A tibble: 11 x 2
Data out
<dbl> <dbl>
1 0 0
2 0 0
3 1 1
4 2 2
5 3 3
6 5 5
7 6 6
8 0 6
9 0 6
10 1 7
11 2 8
I believe there are much better ways than a for
loop for your question but I believe this is quite stable and leads to your desired output. I used to be a big fan of for
loops and whenever I need a solution that requires more flexibility I do not hesitate to use them. In your case this was the first solution that comes to my mind.
out <- vector("numeric", length = nrow(df))
for(i in 2:nrow(df)) {
out[[1]] <- df$Data[[1]]
out[[i]] <- out[[i-1]] + (df$Data[[i]] - df$Data[[i-1]])
if(df$Data[[i]] == 0 & df$Data[[i-1]] != 0) {
out[[i]] <- out[[i-1]]
}
}
cbind(df, out)
Data out
1 0 0
2 0 0
3 1 1
4 2 2
5 3 3
6 5 5
7 6 6
8 0 6
9 0 6
10 1 7
11 2 8
Data
df <- tibble(
Data = c(0, 0, 1, 2, 3, 5, 6, 0, 0, 1, 2)
)
Setup:
library(tidyverse)
(df <- tibble::tibble( Data = c(0, 0, 2, 4, 0, 0, 1, 2, 0, 1, 3)))
Actual code
(
df
%>% mutate(to_add = Data - lag(Data),
to_add = ifelse(is.na(to_add) | to_add < 0, 0, to_add),
out = cumsum(to_add))
%>% select( ! to_add)
)
# A tibble: 11 x 2
Data out
<dbl> <dbl>
1 0 0
2 0 0
3 2 2
4 4 4
5 0 4
6 0 4
7 1 5
8 2 6
9 0 6
10 1 7
11 2 8
The trick is to use the lag
function which returns the value at the previous line.
df <- data.frame( Data = c(0, 0, 1, 2, 0, 0, 1, 2, 0, 1, 2))
df$out <- cumsum(df$Data != 0)
Data out
1 0 0
2 0 0
3 1 1
4 2 2
5 0 2
6 0 2
7 1 3
8 2 4
9 0 4
10 1 5
11 2 6
The trick is to count lines without zeros and then do cumulative sum on it see cumsum.
df$Data != 0
will return TRUE if you need to add 1 and will be converted to number 1 when using cumsum
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With