I'm pretty sure that you all agree that rle
is one of those "gotcha" functions in R. Is there any similar function that can "catch" a "run" of adjacent integer values?
So, if I have a vector like this one:
x <- c(3:5, 10:15, 17, 22, 23, 35:40)
and I call that esoteric function, I'll get response like this one:
lengths: 3, 6, 1, 2, 6
values: (3,4,5), (10,11,12... # you get the point
It's not that hard to write a function like this, but still... any ideas?
1) Calculate values and then lengths based on values
s <- split(x, cumsum(c(0, diff(x) != 1)))
run.info <- list(lengths = unname(sapply(s, length)), values = unname(s))
Running it using x
from the question gives this:
> str(run.info)
List of 2
$ lengths: int [1:5] 3 6 1 2 6
$ values :List of 5
..$ : num [1:3] 3 4 5
..$ : num [1:6] 10 11 12 13 14 15
..$ : num 17
..$ : num [1:2] 22 23
..$ : num [1:6] 35 36 37 38 39 40
2) Calculate lengths and then values based on lengths
Here is a second solution based on Gregor's length calculation:
lens <- rle(x - seq_along(x))$lengths
list(lengths = lens, values = unname(split(x, rep(seq_along(lens), lens))))
3) Calculate lengths and values without using other
This one seems inefficient since it calculates each of lengths
and values
from scratch and it also seems somewhat overly complex but it does manage to get it all down to a single statement so I thought I would add it as well. Its basically just a mix of the prior two solutions marked 1) and 2) above. Nothing really new relative to those two.
list(lengths = rle(x - seq_along(x))$lengths,
values = unname(split(x, cumsum(c(0, diff(x) != 1)))))
EDIT: Added second solution.
EDIT: Added third solution.
How about
rle(x - 1:length(x))$lengths
# 3 6 1 2 6
The lengths are what you want, though I'm blanking on an equally clever way to get the proper values, but with cumsum()
and the original x
they're very accessible.
As you say, it is easy enough to write something similar to rle
. Indeed, adjusting the code for rle
by adding + 1
might give something like
rle_consec <- function(x)
{
if (!is.vector(x) && !is.list(x))
stop("'x' must be an atomic vector")
n <- length(x)
if (n == 0L)
return(structure(list(lengths = integer(), values = x),
class = "rle_consec"))
y <- x[-1L] != x[-n] + 1
i <- c(which(y | is.na(y)), n)
structure(list(lengths = diff(c(0L, i)), values = x[i]),
class = "rle_consec")
}
and using your example
> x <- c(3:5, 10:15, 17, 22, 23, 35:40)
> rle_consec(x)
$lengths
[1] 3 6 1 2 6
$values
[1] 5 15 17 23 40
attr(,"class")
[1] "rle_consec"
which is what John expected.
You could adjust the code further to give the first of each consecutive subsequence rather than the last.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With