data.table
offers a nice convenience function, rleid
for run-length encoding:
library(data.table)
DT = data.table(grp=rep(c("A", "B", "C", "A", "B"), c(2, 2, 3, 1, 2)), value=1:10)
rleid(DT$grp)
# [1] 1 1 2 2 3 3 3 4 5 5
I can mimic this in base R
with:
df <- data.frame(DT)
rep(seq_along(rle(df$grp)$values), times = rle(df$grp)$lengths)
# [1] 1 1 2 2 3 3 3 4 5 5
Does anyone know of a dplyr
equivalent (?) or is the "best" way to create the rleid
behavior with dplyr
is to do something like the following
library(dplyr)
my_rleid = rep(seq_along(rle(df$grp)$values), times = rle(df$grp)$lengths)
df %>%
mutate(rleid = my_rleid)
You can just do (when you have both data.table and dplyr loaded):
DT <- DT %>% mutate(rlid = rleid(grp))
this gives:
> DT grp value rlid 1: A 1 1 2: A 2 1 3: B 3 2 4: B 4 2 5: C 5 3 6: C 6 3 7: C 7 3 8: A 8 4 9: B 9 5 10: B 10 5
When you don't want to load data.table separately you can also use (as mentioned by @DavidArenburg in the comments):
DT <- DT %>% mutate(rlid = data.table::rleid(grp))
And as @RichardScriven said in his comment you can just copy/steal it:
myrleid <- data.table::rleid
If you want to use just base R and dplyr, the better way is to wrap up your own one or two line version of rleid()
as a function and then apply that whenever you need it.
library(dplyr)
myrleid <- function(x) {
x <- rle(x)$lengths
rep(seq_along(x), times=x)
}
## Try it out
DT <- DT %>% mutate(rlid = myrleid(grp))
DT
# grp value rlid
# 1: A 1 1
# 2: A 2 1
# 3: B 3 2
# 4: B 4 2
# 5: C 5 3
# 6: C 6 3
# 7: C 7 3
# 8: A 8 4
# 9: B 9 5
#10: B 10 5
You can do it using the lag
function from dplyr
.
DT <-
DT %>%
mutate(rleid = (grp != lag(grp, 1, default = "asdf"))) %>%
mutate(rleid = cumsum(rleid))
gives
> DT
grp value rleid
1: A 1 1
2: A 2 1
3: B 3 2
4: B 4 2
5: C 5 3
6: C 6 3
7: C 7 3
8: A 8 4
9: B 9 5
10: B 10 5
A simplification (involving no additional package) of the approach used by the OP could be:
DT %>%
mutate(rleid = with(rle(grp), rep(seq_along(lengths), lengths)))
grp value rleid
1 A 1 1
2 A 2 1
3 B 3 2
4 B 4 2
5 C 5 3
6 C 6 3
7 C 7 3
8 A 8 4
9 B 9 5
10 B 10 5
Or:
DT %>%
mutate(rleid = rep(seq(ls <- rle(grp)$lengths), ls))
There are a lot of very good solutions here, but I would like to note that some do not give the same result as data.table::rleid()
when the data has NAs
. Keep in mind that data.table::rleid()
increments everytime there is a change, including NAs
.
Data:
library(data.table)
library(dplyr)
# Data
DT2 = data.table(grp=rep(c("A", "B", NA, "C", "A", NA, "B", NA), c(2, 2, 2, 3, 1, 1, 2, 1)), value=1:14)
df <- data.frame(DT2)
# data.table reild
DT2[, rleid := rleid(DT2$grp)]
DT2
#> grp value rleid
#> 1: A 1 1
#> 2: A 2 1
#> 3: B 3 2
#> 4: B 4 2
#> 5: <NA> 5 3
#> 6: <NA> 6 3
#> 7: C 7 4
#> 8: C 8 4
#> 9: C 9 4
#> 10: A 10 5
#> 11: <NA> 11 6
#> 12: B 12 7
#> 13: B 13 7
#> 14: <NA> 14 8
Just for example, Alex's solution is perfect for OP but doesn't give same result as data.table::rleid()
when dealing with NAs
:
# Alex's solution
df %>%
mutate(rleid = (grp != lag(grp, 1, default = "asdf"))) %>%
mutate(rleid = cumsum(rleid))
#> grp value rleid
#> 1 A 1 1
#> 2 A 2 1
#> 3 B 3 2
#> 4 B 4 2
#> 5 <NA> 5 NA
#> 6 <NA> 6 NA
#> 7 C 7 NA
#> 8 C 8 NA
#> 9 C 9 NA
#> 10 A 10 NA
#> 11 <NA> 11 NA
#> 12 B 12 NA
#> 13 B 13 NA
#> 14 <NA> 14 NA
Here is an easy to read and understand tidyverse
(although slower) equivalent to data.table::rleid()
:
# like rleid()
df %>%
mutate(
rleid = cumsum(
ifelse(is.na(grp), "DEFAULT", grp) != lag(ifelse(is.na(grp), "DEFAULT", grp), default = "DEFAULT")
)
)
#> grp value rleid
#> 1 A 1 1
#> 2 A 2 1
#> 3 B 3 2
#> 4 B 4 2
#> 5 <NA> 5 3
#> 6 <NA> 6 3
#> 7 C 7 4
#> 8 C 8 4
#> 9 C 9 4
#> 10 A 10 5
#> 11 <NA> 11 6
#> 12 B 12 7
#> 13 B 13 7
#> 14 <NA> 14 8
Here is an easy to read and understand tidyverse
equivalent to data.table::rleid()
but that ignores NAs
:
# like rleid() but ignoring NAs
df %>%
mutate(
rleid = cumsum(
(!is.na(grp)) & (grp != lag(ifelse(is.na(grp), "DEFAULT", grp), default = "DEFAULT"))
)
)
#> grp value rleid
#> 1 A 1 1
#> 2 A 2 1
#> 3 B 3 2
#> 4 B 4 2
#> 5 <NA> 5 2
#> 6 <NA> 6 2
#> 7 C 7 3
#> 8 C 8 3
#> 9 C 9 3
#> 10 A 10 4
#> 11 <NA> 11 4
#> 12 B 12 5
#> 13 B 13 5
#> 14 <NA> 14 5
Created on 2022-08-27 with reprex v2.0.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With