Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R split numeric vector at position

Tags:

split

r

vector

I am wondering about the simple task of splitting a vector into two at a certain index:

splitAt <- function(x, pos){
  list(x[1:pos-1], x[pos:length(x)])
}

a <- c(1, 2, 2, 3)

> splitAt(a, 4)
[[1]]
[1] 1 2 2

[[2]]
[1] 3

My question: There must be some existing function for this, but I can't find it? Is maybe split a possibility? My naive implementation also does not work if pos=0 or pos>length(a).

like image 677
user1981275 Avatar asked May 03 '13 11:05

user1981275


People also ask

How do I split a vector in R?

Use the split() function in R to split a vector or data frame. Use the unsplit() method to retrieve the split vector or data frame.


3 Answers

An improvement would be:

splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))

which can now take a vector of positions:

splitAt(a, c(2, 4))
# [[1]]
# [1] 1
# 
# [[2]]
# [1] 2 2
# 
# [[3]]
# [1] 3

And it does behave properly (subjective) if pos <= 0 or pos >= length(x) in the sense that it returns the whole original vector in a single list item. If you'd like it to error out instead, use stopifnot at the top of the function.

like image 83
flodel Avatar answered Oct 06 '22 22:10

flodel


I tried to use flodel's answer, but it was too slow in my case with a very large x (and the function has to be called repeatedly). So I created the following function that is much faster, but also very ugly and doesn't behave properly. In particular, it doesn't check anything and will return buggy results at least for pos >= length(x) or pos <= 0 (you can add those checks yourself if you're unsure about your inputs and not too concerned about speed), and perhaps some other cases as well, so be careful.

splitAt2 <- function(x, pos) {
    out <- list()
    pos2 <- c(1, pos, length(x)+1)
    for (i in seq_along(pos2[-1])) {
        out[[i]] <- x[pos2[i]:(pos2[i+1]-1)]
    }
    return(out)
}

However, splitAt2 runs about 20 times faster with an x of length 106:

library(microbenchmark)
W <- rnorm(1e6)
splits <- cumsum(rep(1e5, 9))
tm <- microbenchmark(
                     splitAt(W, splits),
                     splitAt2(W, splits),
                     times=10)
tm
like image 26
Calimo Avatar answered Oct 06 '22 22:10

Calimo


Another alternative that might be faster and/or more readable/elegant than flodel's solution:

splitAt <- function(x, pos) {
  unname(split(x, findInterval(x, pos)))
}
like image 31
Joshua Ulrich Avatar answered Oct 06 '22 22:10

Joshua Ulrich