Find originating time points within groups

Question

For sequences measured at different time points, I am interested the time point in which each sequence originates, resetting the originating time point if there is a skip.

dd <- data.frame(seq = letters[c(1:6,1,6:7,1:3,7:8,1)],
                 grp = rep(1:5, c(3,4,5,2,1)))
o2 <- c(1,1,1,2,2,2,1,2,3,1,3,3,3,4,5)

par(mar = c(5, 5, 2, 5), las = 1, bty = 'n', xpd = NA)
plot(as.numeric(dd$seq), dd$grp, col = o2, pch = 16,
     cex = 3, xaxt = 'n', yaxt = 'n', xlab = 'seq', ylab = '')
axis(1, at = 1:8, letters[1:8], lwd = 0)
axis(2, at = 1:5, paste0('time ', 1:5))
axis(4, at = 1:5, palette()[1:5])

Maybe this will better illustrate: For each sequence that occurs consecutively, I want to assign that group to the lowest time point and color it accordingly.

enter image description here

So the first group of a starts at time 1 and continues uninterrupted until 3, so theoretically this is the same sequence which originated at time 1. Since there is another group of a, this is assumed not to be related to the other group of a and colored for time point 5.

b and c have two origins so they are colored individually according to the time points.

My desired result is this vector, o2

# split(cbind(dd, desired = o2), dd$grp)
cbind(dd, desired = o2)

#    seq grp desired
# 1    a   1       1
# 2    b   1       1
# 3    c   1       1
# 4    d   2       2
# 5    e   2       2
# 6    f   2       2
# 7    a   2       1
# 8    f   3       2
# 9    g   3       3
# 10   a   3       1
# 11   b   3       3
# 12   c   3       3
# 13   g   4       3
# 14   h   4       4
# 15   a   5       5

MrFlick · Accepted Answer

Here's a possibility using dplyr

pd <- dd %>% arrange(seq,grp) %>% 
    group_by(seq) %>%
    mutate(set=cumsum(grp-lag(grp, default=100)!=1)) %>%
    group_by(seq,set) %>%
    mutate(colgrp=min(grp))

Which you plot with

par(mar = c(5, 5, 2, 5), las = 1, bty = 'n', xpd = NA)
plot(as.numeric(pd$seq), pd$grp, col = pd$colgrp, pch = 16,
     cex = 3, xaxt = 'n', yaxt = 'n', xlab = 'seq', ylab = '')
axis(1, at = 1:8, letters[1:8], lwd = 0)
axis(2, at = 1:5, paste0('time ', 1:5))
axis(4, at = 1:5, palette()[1:5])

Note the weird default=100 value. Ideally i'd like to use -1 or something outside the range, but thanks to this bug you can't enter negative numbers.

Gregor Thomas · Answer

Inspired by my answer to rle-like function that catches runs of adjacent integers

dd %>% group_by(seq) %>%
    arrange(grp) %>%
    mutate(origin_group = grp - 0:(n() - 1)) %>%
    group_by(seq, origin_group) %>%
    mutate(origin = min(grp))

This is very similar to MrFlick's answer, I just use a slightly different method of doing the first grouping.

par(mar = c(5, 5, 2, 5), las = 1, bty = 'n', xpd = NA)
plot(as.numeric(dd2$seq), dd2$grp, col = dd2$origin, pch = 16,
     cex = 3, xaxt = 'n', yaxt = 'n', xlab = 'seq', ylab = '')
axis(1, at = 1:8, letters[1:8], lwd = 0)
axis(2, at = 1:5, paste0('time ', 1:5))
axis(4, at = 1:5, palette()[1:5])

enter image description here

Find originating time points within groups

Tags:

r

rawr

2 Answers

MrFlick

Gregor Thomas

Recent Activity

Donate For Us

Find originating time points within groups

Tags:

r

rawr

2 Answers

MrFlick

Gregor Thomas

Related questions

Recent Activity

Donate For Us