Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple uses of setdiff() on consecutive groups without for looping

Tags:

dataframe

r

I would like to setdiff between consecutive groups without for looping, if possible with a datatable way or a function of apply family.

Dataframe df :

   id group
1  L1     1
2  L2     1
3  L1     2
4  L3     2
5  L4     2
6  L3     3
7  L5     3
8  L6     3
9  L1     4
10 L4     4
11 L2     5

I want to know how much new ids there are between consecutive groups. So, for example, if we compare group 1 and 2, there are two new ids : L3 and L4 so it returns 2 (not with setdiff directly but with length()), if we compare group 2 and 3, L5 and L6 are the news ids so it returns 2 and so on.

Expected results :

new_id
  2
  2
  2
  1

Data :

structure(list(id = structure(c(1L, 2L, 1L, 3L, 4L, 3L, 5L, 6L, 
1L, 4L, 2L), .Label = c("L1", "L2", "L3", "L4", "L5", "L6"), class = "factor"), 
    group = c(1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5)), class = "data.frame", row.names = c(NA, 
-11L), .Names = c("id", "group"))
like image 582
Omlere Avatar asked Mar 09 '23 10:03

Omlere


2 Answers

Here is an option with mapply:

lst <- with(df, split(id, group))   
mapply(function(x, y) length(setdiff(y, x)), head(lst, -1), tail(lst, -1))

#1 2 3 4 
#2 2 2 1 
like image 85
Psidom Avatar answered Mar 11 '23 23:03

Psidom


Here is a data.table way with merge. Suppose the original data.frame is named dt:

library(data.table)

setDT(dt)
dt2 <- copy(dt)[, group := group + 1]

merge(
    dt, dt2, by = 'group', allow.cartesian = T
)[, .(n = length(setdiff(id.x, id.y))), by = group]

#    group n
# 1:     2 2
# 2:     3 2
# 3:     4 2
# 4:     5 1
like image 21
mt1022 Avatar answered Mar 11 '23 23:03

mt1022