I have a vector like the following:
xx <- c(1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1)
I want to find the indexes that have ones and combine them together. In this case, I want the output to look like 1 6 and 11 14 in a 2x2 matrix. My vector is actually very long so I can't do this by hand. Can anyone help me with this? Thanks.
Since the question originally had a tag 'bioinformatics' I'll mention the Bioconductor package IRanges (and it's companion for ranges on genomes GenomicRanges)
> library(IRanges)
> xx <- c(1,1,1,1,1,1,0,0,0,0,1,1,1,1)
> sl = slice(Rle(xx), 1)
> sl
Views on a 14-length Rle subject
views:
start end width
[1] 1 6 6 [1 1 1 1 1 1]
[2] 11 14 4 [1 1 1 1]
which could be coerced to a matrix, but that would often not be convenient for whatever the next step is
> matrix(c(start(sl), end(sl)), ncol=2)
[,1] [,2]
[1,] 1 6
[2,] 11 14
Other operations might start on the Rle
, e.g.,
> xx = c(2,2,2,3,3,3,0,0,0,0,4,4,1,1)
> r = Rle(xx)
> m = cbind(start(r), end(r))[runValue(r) != 0,,drop=FALSE]
> m
[,1] [,2]
[1,] 1 3
[2,] 4 6
[3,] 11 12
[4,] 13 14
See the help page ?Rle
for the full flexibility of the Rle
class; to go from a matrix like that above to a new Rle as asked in the comment below, one might create a new Rle of appropriate length and then subset-assign using an IRanges as index
> r = Rle(0L, max(m))
> r[IRanges(m[,1], m[,2])] = 1L
> r
integer-Rle of length 14 with 3 runs
Lengths: 6 4 4
Values : 1 0 1
One could expand this to a full vector
> as(r, "integer")
[1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1
but often it's better to continue the analysis on the Rle. The class is very flexible, so one way of going from xx
to an integer vector of 1's and 0's is
> as(Rle(xx) > 0, "integer")
[1] 1 1 1 1 1 1 0 0 0 0 1 1 1 1
Again, though, it often makes sense to stay in Rle space. And Arun's answer to your separate question is probably best of all.
Performance (speed) is important, although in this case I think the Rle class provides a lot of flexibility that would weigh against poor performance, and ending up at a matrix is an unlikely end-point for a typical analysis. Nonetheles the IRanges infrastructure is performant
eddi <- function(xx)
matrix(which(diff(c(0,xx,0)) != 0) - c(0,1),
ncol = 2, byrow = TRUE)
iranges = function(xx) {
sl = slice(Rle(xx), 1)
matrix(c(start(sl), end(sl)), ncol=2)
}
iranges.1 = function(xx) {
r = Rle(xx)
cbind(start(r), end(r))[runValue(r) != 0, , drop=FALSE]
}
with
> xx = sample(c(0, 1), 1e5, TRUE)
> microbenchmark(eddi(xx), iranges(xx), iranges.1(xx), times=10)
Unit: milliseconds
expr min lq median uq max neval
eddi(xx) 45.88009 46.69360 47.67374 226.15084 234.8138 10
iranges(xx) 112.09530 114.36889 229.90911 292.84153 294.7348 10
iranges.1(xx) 31.64954 31.72658 33.26242 35.52092 226.7817 10
Something like this, maybe?
if (xx[1] == 1) {
rr <- cumsum(c(0, rle(xx)$lengths))
} else {
rr <- cumsum(rle(xx)$lengths)
}
if (length(rr) %% 2 == 1) {
rr <- head(rr, -1)
}
oo <- matrix(rr, ncol=2, byrow=TRUE)
oo[, 1] <- oo[, 1] + 1
[,1] [,2]
[1,] 1 6
[2,] 11 14
This edit takes care of cases where 1) the vector starts with a "0" rather than a "1" and 2) where the number of consecutive occurrences of 1's are odd/even. For ex: xx <- c(1,1,1,1,1,1,0,0,0,0)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With