I have a vector x
x = c(1, 1, 2.00005, 1, 1, 0, 0, 0, 0, 1, 2, 0, 3, 4, 0, 0, 0, 0, 1, 2, 3, 1, 3)
I need to split values separated by n
(in this case, assume n
is 3
) or more zeros into different groups.
Desired output would be
list(x1 = c(1, 1, 2.00005, 1, 1),
x2 = c(1, 2, 0, 3, 4),
x3 = c(1, 2, 3, 1, 3))
#$x1
#[1] 1.00000 1.00000 2.00005 1.00000 1.00000
#$x2
#[1] 1 2 0 3 4
#$x3
#[1] 1 2 3 1 3
The following does not work because it splits x
even when there are less than n
zeros in a group.
temp = cumsum(x == 0)
split(x[x!=0], temp[x!=0])
#$`0`
#[1] 1.00000 1.00000 2.00005 1.00000 1.00000
#$`4`
#[1] 1 2
#$`5`
#[1] 3 4
#$`9`
#[1] 1 2 3 1 3
Yet another solution using rle
(twice) and inverse.rle
.
n <- 3
r <- rle(as.integer(x == 0))
r$values[r$values == 1 & r$lengths < n] <- 0
r <- rle(inverse.rle(r))
group <- integer(length(x))
start <- 1
for(i in seq_along(r$values)){
group[start:(start + r$lengths[i] - 1)] <- c(1L, rep(0L, r$lengths[i] - 1))
start <- start + r$lengths[i]
}
In the mean time I realized that the code that prepares the loop above and the loop itself could be greatly simplified. In order to make it complete, I will repeat the initial lines of code.
r <- rle(as.integer(x == 0))
r$values[r$values == 1 & r$lengths < n] <- 0
# This is the simplification
group <- c(1L, diff(inverse.rle(r)) != 0)
res <- split(x, cumsum(group))
res <- res[-which(sapply(res, function(y) all(y == 0)))]
res
#$`1`
#[1] 1.00000 1.00000 2.00005 1.00000 1.00000
#
#$`3`
#[1] 1 2 0 3 4
#
#$`5`
#[1] 1 2 3 1 3
Here is a method with rle
, split
, and lapply
# get RLE
temp <- rle(x)
# replace values with grouping variables
temp$values <- cumsum(temp$values == 0 & temp$lengths > 2)
# split on group and lapply through, dropping 0s at beginning which are start of each group
lapply(split(x, inverse.rle(temp)), function(y) y[cummax(y) > 0])
$`0`
[1] 1.00000 1.00000 2.00005 1.00000 1.00000
$`1`
[1] 1 2 0 3 4
$`2`
[1] 1 2 3 1 3
A second method without lapply
is as follows
# get RLE
temp <- rle(x)
# get positions of 0s that force grouping
changes <- which(temp$values == 0 & temp$lengths > 2)
# get group indicators
temp$values <- cumsum(temp$values == 0 & temp$lengths > 2)
# make 0s a new group
temp$values[changes] <- max(temp$values) + 1L
# create list
split(x, inverse.rle(temp))
$`0`
[1] 1.00000 1.00000 2.00005 1.00000 1.00000
$`1`
[1] 1 2 0 3 4
$`2`
[1] 1 2 3 1 3
$`3`
[1] 0 0 0 0 0 0 0 0
Finally, you'd just drop the last list item, like head(split(x, inverse.rle(temp)), -1)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With