Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split vector separated by n zeros into different group

Tags:

r

I have a vector x

x = c(1, 1, 2.00005, 1, 1, 0, 0, 0, 0, 1, 2, 0, 3, 4, 0, 0, 0, 0, 1, 2, 3, 1, 3)

I need to split values separated by n (in this case, assume n is 3) or more zeros into different groups.

Desired output would be

list(x1 = c(1, 1, 2.00005, 1, 1),
     x2 = c(1, 2, 0, 3, 4),
     x3 = c(1, 2, 3, 1, 3))
#$x1
#[1] 1.00000 1.00000 2.00005 1.00000 1.00000

#$x2
#[1] 1 2 0 3 4

#$x3
#[1] 1 2 3 1 3

The following does not work because it splits x even when there are less than n zeros in a group.

temp = cumsum(x == 0)
split(x[x!=0], temp[x!=0])
#$`0`
#[1] 1.00000 1.00000 2.00005 1.00000 1.00000

#$`4`
#[1] 1 2

#$`5`
#[1] 3 4

#$`9`
#[1] 1 2 3 1 3
like image 764
d.b Avatar asked Aug 30 '17 19:08

d.b


2 Answers

Yet another solution using rle (twice) and inverse.rle.

n <- 3
r <- rle(as.integer(x == 0))
r$values[r$values == 1 & r$lengths < n] <- 0
r <- rle(inverse.rle(r))

group <- integer(length(x))
start <- 1
for(i in seq_along(r$values)){
    group[start:(start + r$lengths[i] - 1)] <- c(1L, rep(0L, r$lengths[i] - 1))
    start <- start + r$lengths[i]
}

In the mean time I realized that the code that prepares the loop above and the loop itself could be greatly simplified. In order to make it complete, I will repeat the initial lines of code.

r <- rle(as.integer(x == 0))
r$values[r$values == 1 & r$lengths < n] <- 0

# This is the simplification
group <- c(1L, diff(inverse.rle(r)) != 0)

res <- split(x, cumsum(group))
res <- res[-which(sapply(res, function(y) all(y == 0)))]
res
#$`1`
#[1] 1.00000 1.00000 2.00005 1.00000 1.00000
#
#$`3`
#[1] 1 2 0 3 4
#
#$`5`
#[1] 1 2 3 1 3
like image 85
Rui Barradas Avatar answered Sep 26 '22 01:09

Rui Barradas


Here is a method with rle, split, and lapply

# get RLE
temp <- rle(x)
# replace values with grouping variables
temp$values <- cumsum(temp$values == 0 & temp$lengths > 2)

# split on group and lapply through, dropping 0s at beginning which are start of each group
lapply(split(x, inverse.rle(temp)), function(y) y[cummax(y) > 0])
$`0`
[1] 1.00000 1.00000 2.00005 1.00000 1.00000

$`1`
[1] 1 2 0 3 4

$`2`
[1] 1 2 3 1 3

A second method without lapply is as follows

# get RLE
temp <- rle(x)
# get positions of 0s that force grouping
changes <- which(temp$values == 0 & temp$lengths > 2)
# get group indicators
temp$values <- cumsum(temp$values == 0 & temp$lengths > 2)
# make 0s a new group
temp$values[changes] <- max(temp$values) + 1L

# create list
split(x, inverse.rle(temp))
$`0`
[1] 1.00000 1.00000 2.00005 1.00000 1.00000

$`1`
[1] 1 2 0 3 4

$`2`
[1] 1 2 3 1 3

$`3`
[1] 0 0 0 0 0 0 0 0

Finally, you'd just drop the last list item, like head(split(x, inverse.rle(temp)), -1).

like image 29
lmo Avatar answered Sep 26 '22 01:09

lmo