Supposing I have the data frame below:
key values
A 118
B 118
B 118
C 100
A 90
A 90
A 90
I want to replace the values that have the same key of the previous row with 0. Then, the data would be like:
key values
A 118
B 118
B 0
C 100
A 90
A 0
A 0
For a data frame called 'm', I used for and if:
for (i in 1:(length(m[,1])-1)) {
if (m[,1][i+1]==m[,1][i]) {m[,2][i+1]<- 0}
}
It works, but my data has 2104776 rows and 23 columns. So, it took 3 hours.
I'm wondering if there is a more efficient way to do this task. Any ideas would be appreciated. Thank you.
We can use lag
from dplyr
and check if the value in key
is same as its previous value and assign 0 to those who match.
library(dplyr)
df$values[df$key == lag(df$key)] <- 0
df
# key value
#1 A 118
#2 B 118
#3 B 0
#4 C 100
#5 A 90
#6 A 0
#7 A 0
In data.table
you could use rleid
to group, like this:
library(data.table)
setDT(df)
df[, values := c(head(values, 1), rep(0L, .N-1)), by=rleid(key)]
df
key values
1: A 118
2: B 118
3: B 0
4: C 100
5: A 90
6: A 0
7: A 0
c(head(values, 1), rep(0L, .N-1))
takes the first element of values and prepends this to a vector of 0s the length of the groups minus 1.
You basically take cusmsum of the keys cols and then subtract an index of length of those key to give you a vector which should hold the values and you can convert everything else to 0.
i=cumsum(rle(as.character(df$key))$lengths)-(rle(as.character(df$key))$lengths-1)
df$values[-i]=0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With