I have a data.frame, which has a column of integer values. I need to form a grouping variable that identifies sequence breaks in that column. For instance, I could create another column of ascending integers that adds one whenever the original column's value is not greater than its lagged value. How do I do this?
E.g. if I have a data.frame like this:
df <- data.frame(A = c(1,2,4,6,78,3,56,78,23))
I need some way to produce new table with column B:
df$B <- c(1,1,1,1,1,2,2,2,3)
I have tried e.g. with dplyr
:
df %>% mutate(B = 1,
B = case_when(A < lag(A), B + 1))
That is not quite correct.
We can use cumsum
and diff
which will increment the value every time the sequence is broken
cumsum(c(-1, diff(df$A)) < 0)
#[1] 1 1 1 1 1 2 2 2 3
We can also integrate into dplyr
chain to get
library(dplyr)
df %>%
mutate(B = cumsum(c(-1, diff(A)) < 0))
# A B
#1 1 1
#2 2 1
#3 4 1
#4 6 1
#5 78 1
#6 3 2
#7 56 2
#8 78 2
#9 23 3
A hacky way using lag
could be
df %>%
mutate(B = cumsum(c(-1, (A - lag(A))[-1]) < 0))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With