I would like to create a new column that sequentially labels groups of rows. Original data:
> dt = data.table(index=(1:10), group = c("apple","apple","orange","orange","orange","orange","apple","apple","orange","apple"))
> dt
index group
1: 1 apple
2: 2 apple
3: 3 orange
4: 4 orange
5: 5 orange
6: 6 orange
7: 7 apple
8: 8 apple
9: 9 orange
10: 10 apple
Desired output:
index group id
1: 1 apple 1
2: 2 apple 1
3: 3 orange 1
4: 4 orange 1
5: 5 orange 1
6: 6 orange 1
7: 7 apple 2
8: 8 apple 2
9: 9 orange 2
10: 10 apple 3
dplyr
attempt:
dt %>% group_by(group) %>% mutate( id= row_number())
# A tibble: 10 x 3
# Groups: group [2]
index group id
<int> <chr> <int>
1 1 apple 1
2 2 apple 2
3 3 orange 1
4 4 orange 2
5 5 orange 3
6 6 orange 4
7 7 apple 3
8 8 apple 4
9 9 orange 5
10 10 apple 5
How can I edit this to get the first group of apples as 1, then the first group of oranges as 1, then the second group of apples as 2 etc (see desired output above). Also open to data.table
solution.
library(data.table)
dt[, id := cumsum(c(TRUE, diff(index) > 1)), by="group"]
dt
# index group id
# 1: 1 apple 1
# 2: 2 apple 1
# 3: 3 orange 1
# 4: 4 orange 1
# 5: 5 orange 1
# 6: 6 orange 1
# 7: 7 apple 2
# 8: 8 apple 2
# 9: 9 orange 2
# 10: 10 apple 3
Starting from original dt
:
library(dplyr)
dt %>%
group_by(group) %>%
mutate(id = cumsum(c(TRUE, diff(index) > 1))) %>%
ungroup()
# # A tibble: 10 x 3
# index group id
# <int> <chr> <int>
# 1 1 apple 1
# 2 2 apple 1
# 3 3 orange 1
# 4 4 orange 1
# 5 5 orange 1
# 6 6 orange 1
# 7 7 apple 2
# 8 8 apple 2
# 9 9 orange 2
# 10 10 apple 3
Base R, perhaps a little clunky:
out <- do.call(rbind, by(dt, dt$group,
function(x) transform(x, id = cumsum(c(TRUE, diff(index) > 1)))))
out[order(out$index),]
# index group id
# apple.1 1 apple 1
# apple.2 2 apple 1
# orange.3 3 orange 1
# orange.4 4 orange 1
# orange.5 5 orange 1
# orange.6 6 orange 1
# apple.7 7 apple 2
# apple.8 8 apple 2
# orange.9 9 orange 2
# apple.10 10 apple 3
The names can be removed easily with rownames(out) <- NULL
. The order
part isn't necessary, but I wanted to present it in the same order as the other solutions, and do.call
/by
does not preserve the original order.
Another option using data.table::rleid
twice:
dt[, gid := rleid(group)][, id := rleid(gid), .(group)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With