I have a vector:
a <- c(1, 1, 0, 0, 1, 2, 0, 0)
I would like to get the start and end indexes of each run of equal values:
number start end
0 3 4
0 7 8
1 1 2
1 5 5
2 6 6
A solution from base R.
a <- c(1,1,0,0,1,2,0,0)
# Get run length encoding
b <- rle(a)
# Create a data frame
dt <- data.frame(number = b$values, lengths = b$lengths)
# Get the end
dt$end <- cumsum(dt$lengths)
# Get the start
dt$start <- dt$end - dt$lengths + 1
# Select columns
dt <- dt[, c("number", "start", "end")]
# Sort rows
dt <- dt[order(dt$number), ]
dt
# number start end
#2 0 3 4
#5 0 7 8
#1 1 1 2
#3 1 5 5
#4 2 6 6
Here is a solution using with
to make the code more concise.
with(rle(a), data.frame(number = values,
start = cumsum(lengths) - lengths + 1,
end = cumsum(lengths))[order(values),])
# number start end
#2 0 3 4
#5 0 7 8
#1 1 1 2
#3 1 5 5
#4 2 6 6
By using dplyr
and rleid
from data.table
library(data.table)
library(dplyr)
a=c(1,1,0,0,1,2,0,0)
df=data.frame(number=c(1,1,0,0,1,2,0,0))
df$Id=data.table::rleid(df$number)
df$rowname=seq(1:length(a))
df%>%group_by(Id,number)%>%summarise(start=first(rowname),end=last(rowname))%>%arrange(number)
# Groups: Id [5]
Id number start end
<int> <dbl> <int> <int>
1 2 0 3 4
2 5 0 7 8
3 1 1 1 2
4 3 1 5 5
5 4 2 6 6
A solution using a for loop in base R:
a <- c(1, 1, 0, 0, 1, 2, 0, 0)
start <- 1
res <- data.frame()
v <- c(a, -1) # add number that is different from all other numbers
for (index in 1:(length(v) - 1)) {
if (v[index] != v[index + 1]) {
res <- rbind(res,
data.frame(element = v[index], start = start, stop = index))
start <- index + 1
}
}
Which gives:
element start stop
1 1 1 2
2 0 3 4
3 1 5 5
4 2 6 6
5 0 7 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With