I would like to setdiff
between consecutive groups without for looping, if possible with a datatable way or a function of apply family.
Dataframe df :
id group
1 L1 1
2 L2 1
3 L1 2
4 L3 2
5 L4 2
6 L3 3
7 L5 3
8 L6 3
9 L1 4
10 L4 4
11 L2 5
I want to know how much new ids there are between consecutive groups. So, for example, if we compare group 1 and 2, there are two new ids : L3
and L4
so it returns 2 (not with setdiff
directly but with length()
), if we compare group 2 and 3, L5
and L6
are the news ids so it returns 2 and so on.
Expected results :
new_id
2
2
2
1
Data :
structure(list(id = structure(c(1L, 2L, 1L, 3L, 4L, 3L, 5L, 6L,
1L, 4L, 2L), .Label = c("L1", "L2", "L3", "L4", "L5", "L6"), class = "factor"),
group = c(1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5)), class = "data.frame", row.names = c(NA,
-11L), .Names = c("id", "group"))
Here is an option with mapply
:
lst <- with(df, split(id, group))
mapply(function(x, y) length(setdiff(y, x)), head(lst, -1), tail(lst, -1))
#1 2 3 4
#2 2 2 1
Here is a data.table
way with merge
. Suppose the original data.frame
is named dt
:
library(data.table)
setDT(dt)
dt2 <- copy(dt)[, group := group + 1]
merge(
dt, dt2, by = 'group', allow.cartesian = T
)[, .(n = length(setdiff(id.x, id.y))), by = group]
# group n
# 1: 2 2
# 2: 3 2
# 3: 4 2
# 4: 5 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With