For sequences measured at different time points, I am interested the time point in which each sequence originates, resetting the originating time point if there is a skip.
dd <- data.frame(seq = letters[c(1:6,1,6:7,1:3,7:8,1)],
grp = rep(1:5, c(3,4,5,2,1)))
o2 <- c(1,1,1,2,2,2,1,2,3,1,3,3,3,4,5)
par(mar = c(5, 5, 2, 5), las = 1, bty = 'n', xpd = NA)
plot(as.numeric(dd$seq), dd$grp, col = o2, pch = 16,
cex = 3, xaxt = 'n', yaxt = 'n', xlab = 'seq', ylab = '')
axis(1, at = 1:8, letters[1:8], lwd = 0)
axis(2, at = 1:5, paste0('time ', 1:5))
axis(4, at = 1:5, palette()[1:5])
Maybe this will better illustrate: For each sequence that occurs consecutively, I want to assign that group to the lowest time point and color it accordingly.

So the first group of a starts at time 1 and continues uninterrupted until 3, so theoretically this is the same sequence which originated at time 1. Since there is another group of a, this is assumed not to be related to the other group of a and colored for time point 5.
b and c have two origins so they are colored individually according to the time points.
My desired result is this vector, o2
# split(cbind(dd, desired = o2), dd$grp)
cbind(dd, desired = o2)
# seq grp desired
# 1 a 1 1
# 2 b 1 1
# 3 c 1 1
# 4 d 2 2
# 5 e 2 2
# 6 f 2 2
# 7 a 2 1
# 8 f 3 2
# 9 g 3 3
# 10 a 3 1
# 11 b 3 3
# 12 c 3 3
# 13 g 4 3
# 14 h 4 4
# 15 a 5 5
Here's a possibility using dplyr
pd <- dd %>% arrange(seq,grp) %>%
group_by(seq) %>%
mutate(set=cumsum(grp-lag(grp, default=100)!=1)) %>%
group_by(seq,set) %>%
mutate(colgrp=min(grp))
Which you plot with
par(mar = c(5, 5, 2, 5), las = 1, bty = 'n', xpd = NA)
plot(as.numeric(pd$seq), pd$grp, col = pd$colgrp, pch = 16,
cex = 3, xaxt = 'n', yaxt = 'n', xlab = 'seq', ylab = '')
axis(1, at = 1:8, letters[1:8], lwd = 0)
axis(2, at = 1:5, paste0('time ', 1:5))
axis(4, at = 1:5, palette()[1:5])
Note the weird default=100 value. Ideally i'd like to use -1 or something outside the range, but thanks to this bug you can't enter negative numbers.
Inspired by my answer to rle-like function that catches runs of adjacent integers
dd %>% group_by(seq) %>%
arrange(grp) %>%
mutate(origin_group = grp - 0:(n() - 1)) %>%
group_by(seq, origin_group) %>%
mutate(origin = min(grp))
This is very similar to MrFlick's answer, I just use a slightly different method of doing the first grouping.
par(mar = c(5, 5, 2, 5), las = 1, bty = 'n', xpd = NA)
plot(as.numeric(dd2$seq), dd2$grp, col = dd2$origin, pch = 16,
cex = 3, xaxt = 'n', yaxt = 'n', xlab = 'seq', ylab = '')
axis(1, at = 1:8, letters[1:8], lwd = 0)
axis(2, at = 1:5, paste0('time ', 1:5))
axis(4, at = 1:5, palette()[1:5])

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With