Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find originating time points within groups

Tags:

r

For sequences measured at different time points, I am interested the time point in which each sequence originates, resetting the originating time point if there is a skip.

dd <- data.frame(seq = letters[c(1:6,1,6:7,1:3,7:8,1)],
                 grp = rep(1:5, c(3,4,5,2,1)))
o2 <- c(1,1,1,2,2,2,1,2,3,1,3,3,3,4,5)

par(mar = c(5, 5, 2, 5), las = 1, bty = 'n', xpd = NA)
plot(as.numeric(dd$seq), dd$grp, col = o2, pch = 16,
     cex = 3, xaxt = 'n', yaxt = 'n', xlab = 'seq', ylab = '')
axis(1, at = 1:8, letters[1:8], lwd = 0)
axis(2, at = 1:5, paste0('time ', 1:5))
axis(4, at = 1:5, palette()[1:5])

Maybe this will better illustrate: For each sequence that occurs consecutively, I want to assign that group to the lowest time point and color it accordingly.

enter image description here

So the first group of a starts at time 1 and continues uninterrupted until 3, so theoretically this is the same sequence which originated at time 1. Since there is another group of a, this is assumed not to be related to the other group of a and colored for time point 5.

b and c have two origins so they are colored individually according to the time points.

My desired result is this vector, o2

# split(cbind(dd, desired = o2), dd$grp)
cbind(dd, desired = o2)

#    seq grp desired
# 1    a   1       1
# 2    b   1       1
# 3    c   1       1
# 4    d   2       2
# 5    e   2       2
# 6    f   2       2
# 7    a   2       1
# 8    f   3       2
# 9    g   3       3
# 10   a   3       1
# 11   b   3       3
# 12   c   3       3
# 13   g   4       3
# 14   h   4       4
# 15   a   5       5
like image 924
rawr Avatar asked Mar 08 '26 15:03

rawr


2 Answers

Here's a possibility using dplyr

pd <- dd %>% arrange(seq,grp) %>% 
    group_by(seq) %>%
    mutate(set=cumsum(grp-lag(grp, default=100)!=1)) %>%
    group_by(seq,set) %>%
    mutate(colgrp=min(grp))

Which you plot with

par(mar = c(5, 5, 2, 5), las = 1, bty = 'n', xpd = NA)
plot(as.numeric(pd$seq), pd$grp, col = pd$colgrp, pch = 16,
     cex = 3, xaxt = 'n', yaxt = 'n', xlab = 'seq', ylab = '')
axis(1, at = 1:8, letters[1:8], lwd = 0)
axis(2, at = 1:5, paste0('time ', 1:5))
axis(4, at = 1:5, palette()[1:5])

Note the weird default=100 value. Ideally i'd like to use -1 or something outside the range, but thanks to this bug you can't enter negative numbers.

like image 161
MrFlick Avatar answered Mar 11 '26 03:03

MrFlick


Inspired by my answer to rle-like function that catches runs of adjacent integers

dd %>% group_by(seq) %>%
    arrange(grp) %>%
    mutate(origin_group = grp - 0:(n() - 1)) %>%
    group_by(seq, origin_group) %>%
    mutate(origin = min(grp))

This is very similar to MrFlick's answer, I just use a slightly different method of doing the first grouping.

par(mar = c(5, 5, 2, 5), las = 1, bty = 'n', xpd = NA)
plot(as.numeric(dd2$seq), dd2$grp, col = dd2$origin, pch = 16,
     cex = 3, xaxt = 'n', yaxt = 'n', xlab = 'seq', ylab = '')
axis(1, at = 1:8, letters[1:8], lwd = 0)
axis(2, at = 1:5, paste0('time ', 1:5))
axis(4, at = 1:5, palette()[1:5])

enter image description here

like image 40
Gregor Thomas Avatar answered Mar 11 '26 03:03

Gregor Thomas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!