I have a column in a data frame (here named "a") where starts of an sequence are marked with 1, while subsequent incidents, belonging to the same sequence are marked with N/A. Now I would like to create a new column ("b") to index all incidents belonging to the same sequence (1:n) and then create a third column ("c") with numbers indicating which incidents belong to the same sequence.
I am sure the solution is very easy and striking once I see it, however, at the moment I just don't manage to come up with an idea myself how to best solve this. Also other questions did not cover my question, as far as I have seen.
Usually I am using dplyr (I also need to do some group_by with my data, which in reality is more complex than I outlined here), so I would be very happy about a dplyr solution if possible!
Example code to start with:
df <- data.frame("a"= c(1, NA, NA, NA, 1, NA, 1, 1, 1))
How it should look like in the end:
df_final <- data.frame("a"= c(1, NA, NA, NA, 1, NA, 1, 1, 1), "b"= c(1, 2, 3, 4, 1, 2, 1, 1, 1), "c" = c(1, 1, 1, 1, 2, 2, 3, 4, 5))
EDIT
Since the question has changed now, getting expected output is more simple now
library(dplyr)
df %>%
group_by(c = cumsum(!is.na(a))) %>%
mutate(b = row_number())
# a c b
# <dbl> <int> <int>
#1 1 1 1
#2 NA 1 2
#3 NA 1 3
#4 NA 1 4
#5 1 2 1
#6 NA 2 2
#7 1 3 1
#8 1 4 1
#9 1 5 1
And using base R that would be :
df$c <- cumsum(!is.na(df$a))
df$b <- with(df, ave(a, c, FUN = seq_along))
Original Answer
Unfortunately, the grouping for creation of b and c is different. For b we group_by sequential non-NA values and take cumulative over them and then generate a row_number for every group. For c we take rle on non-NA values and repeat the group values lengths times.
library(dplyr)
df %>%
group_by(group = cumsum(!is.na(a))) %>%
mutate(b = row_number()) %>%
ungroup() %>%
select(-group) %>%
mutate(c = with(rle(!is.na(a)), rep(cumsum(values), lengths)))
# A tibble: 9 x 3
# a b c
# <dbl> <int> <int>
#1 1 1 1
#2 NA 2 1
#3 NA 3 1
#4 NA 4 1
#5 1 1 2
#6 NA 2 2
#7 1 1 3
#8 1 1 3
#9 1 1 3
Of course this is not dplyr specific answer and can be answered with base R as well
df$b <- with(df, ave(a, cumsum(!is.na(a)), FUN = seq_along))
df$c <- with(df, with(rle(!is.na(a)), rep(cumsum(values), lengths)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With