I have a vector (in a data frame) filled with increasing numbers. I would like to find all consecutive numbers and replace them with the first number from the series. Is this possible to do without a loop?
My input data is:
V1
1
4
5
7
10
15
16
17
20
What I would like output is:
V1 Out
1 1
4 4
5 4
7 7
10 10
15 15
16 15
17 15
20 20
So far, I managed to calculate the difference between two rows using diff() and loop through the vector to replace the right values.
V1 <- c(1, 4, 5, 7, 10, 15, 16, 17, 20)
df <- data.frame(V1)
df$diff <- c(0, diff(df$V1) == 1)
df$Out <- NA
for (j in 1:(nrow(df))){
if (df$diff[j] == 0){
df$Out[j] <- df$V1[j]
} else {
df$Out[j] <- df$V1[max(which(df$diff[1:j] == 0))]
}
}
It does the job, but it is very inefficient. Is there a way to get rid of the loop and make this code fast?
Thank you very much!
Using base R you can do,
with(d1, ave(V1, cumsum(c(1, diff(V1) != 1)), FUN = function(i) i[1]))
#[1] 1 4 4 7 10 15 15 15 20
dplyr
library(dplyr)
d1 %>%
group_by(grp = cumsum(c(1, diff(V1) != 1))) %>%
mutate(out = first(V1))
data.table
library(data.table)
setDT(d1)[, out := first(V1), by = cumsum(c(1, diff(V1) != 1))]
Another option, in 3 steps, using zoo
package:
Define V2
as V1
:
df$V2 <- df$V1
Replace the consecutive value (where diff
is 1
) by NA
:
df$V2[c(FALSE, diff(df$V1)==1)] <- NA
Finally, use zoo::na.locf
to replace NA
s with last value:
library(zoo)
df$V2 <- na.locf(df$V2)
Output:
df
# V1 V2
# 1 1 1
# 2 4 4
# 3 5 4
# 4 7 7
# 5 10 10
# 6 15 15
# 7 16 15
# 8 17 15
# 9 20 20
Another writting, in one line, using magrittr
:
library(magrittr)
df$V2 <- df$V1 %>% replace(c(FALSE, diff(df$V1)==1), NA) %>% na.locf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With