Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: detecting sequences

Tags:

r

Suppose I have a vector of numbers, that has some numbers that are in sequence and some that aren't:

x <- c(1,2,3,5,6,7,8,11,14,16,17)

How would I manipulate this so that a string is returned such that the sequences are grouped together?

y <- "1-3, 5-8, 11, 14, 16-17"
like image 512
alki Avatar asked Jan 19 '26 04:01

alki


1 Answers

We create a grouping variable ('gr') by comparing the adjacent elements using diff, check for the output that are not 1, do the cumsum. We use this in tapply to paste the range of elements in 'x'.

gr <- cumsum(c(TRUE,diff(x)!=1))
y <- unname(tapply(x, gr, FUN= function(.x)
                  paste(unique(range(.x)), collapse='-')))

If we need a single string, paste the 'y' together using toString which is a wrapper for paste(..., collapse=', ')

y <- toString(y)
y
#[1] "1-3, 5-8, 11, 14, 16-17"

We can also do this using any of the aggregate by group methods. For example, using data.table, we convert 'x' to 'data.table', grouped by 'gr' (created using cumsum(...)), we paste the elements together, and use toString as before.

library(data.table)
y1 <- setDT(list(x))[,paste(unique(range(V1)), collapse='-') ,
                 by = .(cumsum(c(TRUE, diff(V1)!=1)))]$V1
toString(y1)
#[1] "1-3, 5-8, 11, 14, 16-17"
like image 54
akrun Avatar answered Jan 20 '26 20:01

akrun