Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collapse consecutive runs of numbers to a string of ranges

Tags:

Let's say I have the following vector of numbers:

vec = c(1, 2, 3, 5, 7, 8, 9, 10, 11, 12)

I'm looking for a function that will create a string summarizing the list of numbers the way a human would. That is, each run of consecutive numbers (here 1, 2, 3 and 7, 8, 9, 10, 11, 12) is collapsed into its start and end value:

"1-3, 5, 7-12"

How can I do this in R?

like image 902
CephBirk Avatar asked Jan 06 '16 15:01

CephBirk


2 Answers

Adding another alternative, you could use a deparseing approach. For example:

deparse(c(1L, 2L, 3L))
#[1] "1:3"

Taking advantage of as.character "deparse"ing a given "list" as input, we could use:

as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
#[1] "1:3"  "5"    "7:12"
toString(gsub(":", "-", .Last.value))
#[1] "1-3, 5, 7-12"
like image 94
alexis_laz Avatar answered Sep 18 '22 18:09

alexis_laz


I assume that the vector is sorted as in the example. If not use vec <- sort(vec) beforehand.

Edit note: @DavidArenburg spotted a mistake in my original answer where c(min(x), x) should actually be c(0, x). Since we know now that we always need to add a 0 in the first place, we can omit the first step of creating x and do it "on the fly". The original answer and additional options are now edited to reflect that (you can check the edit history for the original post). Thanks David!

A note on calls to unname: I used unname(sapply(...)) to ensure that the resulting vector is not named, otherwise it would be named 0:(n-1) where n equals the length of new_vec. As @Tensibai noted correctly in the comments, this doesn't matter if the final aim is to generate a length-1 character vector as produced by running toString(new_vec) since vector names will be omitted by toString anyway.


One option (possibly not the shortest) would be:

new_vec <- unname(sapply(split(vec, c(0, cumsum(diff(vec) > 1))), function(y) {
  if(length(y) == 1) y else paste0(head(y, 1), "-", tail(y, 1))
}))

Result:

new_vec
#[1] "1-3"  "5"    "7-12"
toString(new_vec)
#[1] "1-3, 5, 7-12"

Thanks to @Zelazny7 it can be shortened by using the range function:

new_vec <- unname(sapply(split(vec, c(0, cumsum(diff(vec) > 1))), function(y) {
    paste(unique(range(y)), collapse='-')
}))

Thanks to @DavidArenburg it can be further shortened by using tapply instead of sapply + split:

new_vec <- unname(tapply(vec, c(0, cumsum(diff(vec) > 1)), function(y) {
  paste(unique(range(y)), collapse = "-")
}))
like image 22
talat Avatar answered Sep 17 '22 18:09

talat