I want to split up my data into groups of successive rows that pass some test. Here's an example:
set.seed(1)
n <- 29
ok <- sample(c(TRUE,FALSE),n,replace=TRUE,prob=c(.7,.3))
vec <- (1:n)[ok]
# [1] 1 2 3 5 8 9 10 11 12 13 14 16 19 22 23 24 25 26 27 28
The desired output is "vec" grouped into contiguous sequences:
out <- list(1:3,5,8:14,16,19,22:28)
This works:
nv <- length(vec)
splits <- 1 + which(diff(vec) != 1)
splits <- c(1,splits,nv+1)
nsp <- length(splits)
out <- list()
for (i in 1:(nsp-1)){
out[[i]] <- vec[splits[i]:(splits[i+1]-1)]
}
I am guessing there is a cleaner way in base R...? I'm not yet adept with the rle
and cumsum
tricks I've seen on SO...
Here's a cumsum
"trick" for you:
split(vec, cumsum(c(1, diff(vec)) - 1))
update
Here is a simple example using your version split(vec, cumsum(c(0, diff(vec) > 1)))
with each step broken down:
vec <- c(1:3,7:9) # 1 2 3 7 8 9 (sample with two contiguous sequences)
diff(vec) # 1 1 4 1 1 (lagged difference)
diff(vec) > 1 # F F T F F (not contiguous where diff > 1)
# 0 0 1 0 0 (numeric equivalent for T/F)
c(0, diff(vec) > 1) # 0 0 0 1 0 0 (pad with 0 to align with original vector)
cumsum(c(0, diff(vec) > 1)) # 0 0 0 1 1 1 (cumulative sum of logical values)
groups <- cumsum(c(0, diff(vec) > 1)) # 0 0 0 1 1 1
sets <- split(vec, groups) # split into groups named by cumulative sum
sets
# $`0`
# [1] 1 2 3
#
# $`1`
# [1] 7 8 9
And then if you want to output it for some reason:
# Create strings representing each contiguous range
set_strings <- sapply(sets, function(x) paste0(min(x),":",max(x)))
set_strings
# 0 1
# "1:3" "7:9"
# Print out a concise representation of all contiguous sequences
print(paste0(set_strings,collapse=","))
# [1] "1:3,7:9"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With