Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group integer vector into consecutive runs

Tags:

r

aggregate

I have two vectors of integer. I would like to identify the intervals of consecutive integer sequences presented in the second vector conditioned by the first vector (this vector can be seen as a factor, by which the second vector can be classified into several groups).

Here I present a dummy for my problem.

The data, in one group (defined by the first vector) of the second vector, the integers monotonically increase.

my.data <- data.frame(
    V1=c(rep(1, 10), rep(2, 9), rep(3,11)), 
    V2=c(seq(2,5), seq(7,11), 13, seq(4, 9), seq(11,13), seq(1, 6), seq(101, 105))
)

What I want:

  • output the begin and end of the interval
  • here, group in the first column, the beginning integer in the second, the end integer in the third.

Expected results:

1, 2, 5 \n
1, 7, 11 \n
1, 13, 13 \n
2, 4, 9 \n
2, 11, 13 \n
3, 1, 6 \n
3, 101, 105 \n
like image 982
jianfeng.mao Avatar asked Dec 06 '11 13:12

jianfeng.mao


2 Answers

Here's a brief answer using aggregate....

runs <- cumsum( c(0, diff(my.data$V2) > 1) )
aggregate(V2 ~ runs + V1, my.data, range)[,-1]


  V1 V2.1 V2.2
1  1    2    5
2  1    7   11
3  1   13   13
4  2    4    9
5  2   11   13
6  3    1    6
7  3  101  105
like image 108
John Avatar answered Oct 29 '22 23:10

John


A while back, I wrote a variant of rle() which I named seqle() because it allows one to look for integer sequences rather than repetitions. Then, you can do:

Rgames: seqle(my.data[my.data$V1==1,2]) #repeat for my.data$V1 equal to 2 and 3
$lengths 
[1] 4 5 1 

$values 
[1]  2  7 13 

(for example). It would take a little fiddling to get these results into the tabular form you want, but just thought I'd mention it. BTW, here's the code for seqle. If you set incr=0 you get the base rle result.

function(x,incr=1){ 

    if(!is.numeric(x)) x <- as.numeric(x) 
    n <- length(x)  
    y <- x[-1L] != x[-n] + incr 
    i <- c(which(y|is.na(y)),n) 
    list( lengths = diff(c(0L,i)),  values = x[head(c(0L,i)+1L,-1L)]) 
} 

EDIT: There's an excellent upgrade to this, provided by flodel, at How to check if a vector contains n consecutive numbers . He pointed out that this version has the usual floating-point error problems when working with doubles, and provided a fix as well.

like image 30
Carl Witthoft Avatar answered Oct 30 '22 00:10

Carl Witthoft