Let's say I have the following vector of numbers:
vec = c(1, 2, 3, 5, 7, 8, 9, 10, 11, 12)
I'm looking for a function that will create a string summarizing the list of numbers the way a human would. That is, each run of consecutive numbers (here 1, 2, 3
and 7, 8, 9, 10, 11, 12
) is collapsed into its start and end value:
"1-3, 5, 7-12"
How can I do this in R?
Adding another alternative, you could use a deparse
ing approach. For example:
deparse(c(1L, 2L, 3L))
#[1] "1:3"
Taking advantage of as.character
"deparse"ing a given "list" as input, we could use:
as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
#[1] "1:3" "5" "7:12"
toString(gsub(":", "-", .Last.value))
#[1] "1-3, 5, 7-12"
I assume that the vector is sorted as in the example. If not use vec <- sort(vec)
beforehand.
Edit note: @DavidArenburg spotted a mistake in my original answer where c(min(x), x)
should actually be c(0, x)
. Since we know now that we always need to add a 0
in the first place, we can omit the first step of creating x
and do it "on the fly". The original answer and additional options are now edited to reflect that (you can check the edit history for the original post). Thanks David!
A note on calls to unname
: I used unname(sapply(...))
to ensure that the resulting vector is not named, otherwise it would be named 0:(n-1) where n equals the length of new_vec
. As @Tensibai noted correctly in the comments, this doesn't matter if the final aim is to generate a length-1 character vector as produced by running toString(new_vec)
since vector names will be omitted by toString
anyway.
One option (possibly not the shortest) would be:
new_vec <- unname(sapply(split(vec, c(0, cumsum(diff(vec) > 1))), function(y) {
if(length(y) == 1) y else paste0(head(y, 1), "-", tail(y, 1))
}))
Result:
new_vec
#[1] "1-3" "5" "7-12"
toString(new_vec)
#[1] "1-3, 5, 7-12"
Thanks to @Zelazny7 it can be shortened by using the range
function:
new_vec <- unname(sapply(split(vec, c(0, cumsum(diff(vec) > 1))), function(y) {
paste(unique(range(y)), collapse='-')
}))
Thanks to @DavidArenburg it can be further shortened by using tapply
instead of sapply
+ split
:
new_vec <- unname(tapply(vec, c(0, cumsum(diff(vec) > 1)), function(y) {
paste(unique(range(y)), collapse = "-")
}))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With