Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Last observation added forward

Tags:

r

First question here after some searching and scrolling I'm still stuck

I have a big vector that should always be increasing but it sometimes reset to 0. I'd like that everytime it resets to 0 the previous non-0 value gets added to the following values. I've tried LOCF but it doesn't work as it only fills my 0 values with the previous values and then goes back to the lowest value.

Vector example:

Data Desired transformation
0 0
0 0
1 1
2 2
3 3
5 5
6 6
0 6
0 6
1 7
2 8
like image 442
fxpadi Avatar asked Apr 30 '21 10:04

fxpadi


People also ask

What does last observation carried forward mean?

Last observation carried forward (LOCF) is a method of imputing missing data in longitudinal studies. If a person drops out of a study before it ends, then his or her last observed score on the dependent variable is used for all subsequent (i.e., missing) observation points.

Why LOCF?

The ultimate aim of LOCF lies in enhancing the quality of higher education in India and encouraging the students to gain the best skills & knowledge during their student journey. Learning outcomes are determined in sync with what students are expected to understand at the end of their study program.

What is LOCF in SAS?

The last observation carried forward (LOCF) method is a common way for imputing data with dropouts in clinical trial study. The last non-missing observed value is used to fill in missing values at a later time point.

What is last observation carried forward in research?

Note: Last observation carried forward (LOCF) is a method of imputing missing data in longitudinal studies. If a person drops out of a study before it ends, then his or her last observed score on the dependent variable is used for all subsequent (i.e., missing) observation points.

How to carry forward last observation in column to fill Na s?

Adding the new tidyr::fill () function for carrying forward the last observation in a column to fill in NA s: Show activity on this post. There are a bunch of packages implementing exactly this functionality. (with same basic functionality, but some differences in additional options)

Which observation should be carried forward on treatment in modified ITT?

The analysis should be applied to the last observation carried forward on treatment in the modified ITT population defined as subjects who received at least one dose of study drug and have at least one post-baseline assessment of body weight.

Should the LOCF approach be abandoned?

While we acknowledge that the LOCF is not a perfect approach, the LOCF approach should not be totally abandoned. In some situations, the LOCF approach is commonly agreed to be a more conservative approach and may be appropriate to be used. Only under certain restrictive assumptions does LOCF produce an unbiased estimate of the treatment effect.


Video Answer


4 Answers

Perhaps you can try cumsum + rle like below

v <- df$Data
idx <- with(
  rle(v == 0),
  cumsum(lengths)[values] - 1
)
df$DataOut <- v + cumsum(replace(rep(0, length(v)), idx, v[pmax(1, idx - 1)]))

which gives

> df
# A tibble: 11 x 2
    Data DataOut
   <dbl>   <dbl>
 1     0       0
 2     0       0
 3     1       1
 4     2       2
 5     3       3
 6     5       5
 7     6       6
 8     0       6
 9     0       6
10     1       7
11     2       8
like image 160
ThomasIsCoding Avatar answered Oct 18 '22 21:10

ThomasIsCoding


I think this will also do ( I haven't removed dummy column d for better understanding that what's actually happening here)

df %>% mutate(d = c(0, diff(Data)),
              out = cumsum(pmax(-1 *Data, d)))

    Data     d   out
   <dbl> <dbl> <dbl>
 1     0     0     0
 2     0     0     0
 3     1     1     1
 4     2     1     2
 5     3     1     3
 6     5     2     5
 7     6     1     6
 8     0    -6     6
 9     0     0     6
10     1     1     7
11     2     1     8

Once you understand, you can simply do

df %>% mutate(out = cumsum(pmax(-1 *Data, c(0, diff(Data)))))

# A tibble: 11 x 2
    Data   out
   <dbl> <dbl>
 1     0     0
 2     0     0
 3     1     1
 4     2     2
 5     3     3
 6     5     5
 7     6     6
 8     0     6
 9     0     6
10     1     7
11     2     8
like image 44
AnilGoyal Avatar answered Oct 18 '22 19:10

AnilGoyal


I believe there are much better ways than a for loop for your question but I believe this is quite stable and leads to your desired output. I used to be a big fan of for loops and whenever I need a solution that requires more flexibility I do not hesitate to use them. In your case this was the first solution that comes to my mind.

out <- vector("numeric", length = nrow(df))
for(i in 2:nrow(df)) {
  out[[1]] <- df$Data[[1]]
  out[[i]] <- out[[i-1]] + (df$Data[[i]] - df$Data[[i-1]])
  
  if(df$Data[[i]] == 0 & df$Data[[i-1]] != 0) {
    out[[i]] <- out[[i-1]]
  }
}

cbind(df, out)

   Data out
1     0   0
2     0   0
3     1   1
4     2   2
5     3   3
6     5   5
7     6   6
8     0   6
9     0   6
10    1   7
11    2   8

Data

df <- tibble(
  Data = c(0, 0, 1, 2, 3, 5, 6, 0, 0, 1, 2)
)
like image 3
Anoushiravan R Avatar answered Oct 18 '22 19:10

Anoushiravan R


Using Tidyverse

Setup:

library(tidyverse)

(df <- tibble::tibble( Data = c(0, 0, 2, 4, 0, 0, 1, 2, 0, 1, 3)))

Actual code

(
  df
  %>% mutate(to_add = Data - lag(Data),
             to_add = ifelse(is.na(to_add) | to_add < 0, 0, to_add),
             out = cumsum(to_add))
  %>% select( ! to_add)
)

Output

# A tibble: 11 x 2
    Data   out
   <dbl> <dbl>
 1     0     0
 2     0     0
 3     2     2
 4     4     4
 5     0     4
 6     0     4
 7     1     5
 8     2     6
 9     0     6
10     1     7
11     2     8

The trick is to use the lag function which returns the value at the previous line.


Base R (works only if values are consecutive)

df <- data.frame( Data = c(0, 0, 1, 2, 0, 0, 1, 2, 0, 1, 2))

df$out <- cumsum(df$Data != 0)

output

   Data out
1     0   0
2     0   0
3     1   1
4     2   2
5     0   2
6     0   2
7     1   3
8     2   4
9     0   4
10    1   5
11    2   6

The trick is to count lines without zeros and then do cumulative sum on it see cumsum.

df$Data != 0 will return TRUE if you need to add 1 and will be converted to number 1 when using cumsum

like image 2
pietrodito Avatar answered Oct 18 '22 19:10

pietrodito