I have following data frame:
a <- seq(1:14)
b <- c(0, 0, "start", 0, 0, 0, "end", 0, 0, "start", 0, "end", 0, 0)
df <- data.frame(a, b)
df
a b
1 0
2 0
3 start
4 0
5 0
6 0
7 end
8 0
9 0
10 start
11 0
12 end
13 0
14 0
Now, what I want to do is to recode the values in b between "start" and "end" so that:
df
a b
1 0
2 0
3 start
4 1
5 1
6 1
7 end
8 0
9 0
10 start
11 1
12 end
13 0
14 0
So far, I haven't got any working code. I tried something with which()
and between()
and inrange()
from the data.table
package, but I couldn't really figure it out. Any ideas how to solve this?
Given
df <- data.frame(a, b, stringsAsFactors = FALSE)
# ^^^^^^^^^^^^^^^^^^^^^^^^
We can do
idx <- (cumsum(b == "start") - cumsum(b == "end") - (b == "start")) == 1
df <- transform(df, b = replace(b, idx, "1"))
df
# a b
#1 1 0
#2 2 0
#3 3 start
#4 4 1
#5 5 1
#6 6 1
#7 7 end
#8 8 0
#9 9 0
#10 10 start
#11 11 1
#12 12 end
#13 13 0
#14 14 0
idx
is TRUE
for elements between "start" and "end".
When we call cumsum(b == "start") - cumsum(b == "end")
we are almost there
cumsum(b == "start") - cumsum(b == "end")
# [1] 0 0 1 1 1 1 0 0 0 1 1 0 0 0
We only need to set the positions to zero where b == "start"
, i.e.
cumsum(b == "start") - cumsum(b == "end") - b == "start"
# [1] 0 0 0 1 1 1 0 0 0 0 1 0 0 0
Test if this vector is 1
to make it logical
idx <- (cumsum(b == "start") - cumsum(b == "end") - (b == "start")) == 1
Result
idx
[1] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
We use this logical vector to replace the respective elements of b
with "1"
.
A more compact answer from @RonakShah comment is,
df$b[unlist(mapply(`:`, which(df$b == "start") + 1, which(df$b == "end") - 1))] <- 1
Original Answer
Similar logic to the above compact answer, using lapply
, here we find the start and end positions, map this to a list and find the index, then replace the index with 1,
starting <- which(b == "start")
ending <- which(b == "end")
my.ls <- lapply(Map(c, starting, ending), function(x) (x[1]+1):(x[2]-1))
index <- unlist(my.ls)
b[index] <- 1
df <- data.frame(a, b)
df
a b
1 1 0
2 2 0
3 3 start
4 4 1
5 5 1
6 6 1
7 7 end
8 8 0
9 9 0
10 10 start
11 11 1
12 12 end
13 13 0
14 14 0
Old loop answer
You are able to use the which functions as follows, first define all the starting and ending points, then loop through and change them to 1...
a <- seq(1:14)
b <- c(0, 0, "start", 0, 0, 0, "end", 0, 0, "start", 0, "end", 0, 0)
starting <- which(b == "start")
ending <- which(b == "end")
for (i in 1:length(starting)){
index <- (starting[i]+1):(ending[i]-1)
b[index] <- 1
}
df <- data.frame(a, b)
df
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With