Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rle-like function that catches "run" of adjacent integers

Tags:

r

I'm pretty sure that you all agree that rle is one of those "gotcha" functions in R. Is there any similar function that can "catch" a "run" of adjacent integer values?

So, if I have a vector like this one:

x <- c(3:5, 10:15, 17, 22, 23, 35:40)

and I call that esoteric function, I'll get response like this one:

lengths: 3, 6, 1, 2, 6
values: (3,4,5), (10,11,12... # you get the point

It's not that hard to write a function like this, but still... any ideas?

like image 874
aL3xa Avatar asked Dec 11 '11 19:12

aL3xa


3 Answers

1) Calculate values and then lengths based on values

s <- split(x, cumsum(c(0, diff(x) != 1)))
run.info <- list(lengths = unname(sapply(s, length)), values = unname(s))

Running it using x from the question gives this:

> str(run.info)
List of 2
 $ lengths: int [1:5] 3 6 1 2 6
 $ values :List of 5
  ..$ : num [1:3] 3 4 5
  ..$ : num [1:6] 10 11 12 13 14 15
  ..$ : num 17
  ..$ : num [1:2] 22 23
  ..$ : num [1:6] 35 36 37 38 39 40

2) Calculate lengths and then values based on lengths

Here is a second solution based on Gregor's length calculation:

lens <- rle(x - seq_along(x))$lengths 
list(lengths = lens, values = unname(split(x, rep(seq_along(lens), lens))))

3) Calculate lengths and values without using other

This one seems inefficient since it calculates each of lengths and values from scratch and it also seems somewhat overly complex but it does manage to get it all down to a single statement so I thought I would add it as well. Its basically just a mix of the prior two solutions marked 1) and 2) above. Nothing really new relative to those two.

list(lengths = rle(x - seq_along(x))$lengths,
           values = unname(split(x, cumsum(c(0, diff(x) != 1)))))

EDIT: Added second solution.

EDIT: Added third solution.  

like image 71
G. Grothendieck Avatar answered Oct 31 '22 08:10

G. Grothendieck


How about

rle(x - 1:length(x))$lengths   
# 3 6 1 2 6

The lengths are what you want, though I'm blanking on an equally clever way to get the proper values, but with cumsum() and the original x they're very accessible.

like image 6
Gregor Thomas Avatar answered Oct 31 '22 08:10

Gregor Thomas


As you say, it is easy enough to write something similar to rle. Indeed, adjusting the code for rle by adding + 1 might give something like

rle_consec <- function(x)
{
    if (!is.vector(x) && !is.list(x))
        stop("'x' must be an atomic vector")
    n <- length(x)
    if (n == 0L)
    return(structure(list(lengths = integer(), values = x),
             class = "rle_consec"))
    y <- x[-1L] != x[-n] + 1
    i <- c(which(y | is.na(y)), n)
    structure(list(lengths = diff(c(0L, i)), values = x[i]),
              class = "rle_consec")
}

and using your example

> x <- c(3:5, 10:15, 17, 22, 23, 35:40)
> rle_consec(x)
$lengths
[1] 3 6 1 2 6

$values
[1]  5 15 17 23 40

attr(,"class")
[1] "rle_consec"

which is what John expected.

You could adjust the code further to give the first of each consecutive subsequence rather than the last.

like image 5
Henry Avatar answered Oct 31 '22 07:10

Henry