Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace values of a dataframe based on values of the previous rows

Tags:

dataframe

r

Supposing I have the data frame below:

key         values
A           118    
B           118
B           118
C           100
A            90
A            90
A            90

I want to replace the values that have the same key of the previous row with 0. Then, the data would be like:

key         values
A           118    
B           118
B             0
C           100
A            90
A             0
A             0

For a data frame called 'm', I used for and if:

for (i in 1:(length(m[,1])-1)) {

   if (m[,1][i+1]==m[,1][i]) {m[,2][i+1]<- 0}

 }

It works, but my data has 2104776 rows and 23 columns. So, it took 3 hours.

I'm wondering if there is a more efficient way to do this task. Any ideas would be appreciated. Thank you.

like image 881
Guilherme Adame Avatar asked Feb 07 '17 18:02

Guilherme Adame


3 Answers

We can use lag from dplyr and check if the value in key is same as its previous value and assign 0 to those who match.

library(dplyr)
df$values[df$key == lag(df$key)] <- 0

df
#  key  value
#1   A  118
#2   B  118
#3   B    0
#4   C  100
#5   A   90
#6   A    0
#7   A    0
like image 176
Ronak Shah Avatar answered Oct 14 '22 19:10

Ronak Shah


In data.table you could use rleid to group, like this:

library(data.table)
setDT(df)
df[, values := c(head(values, 1), rep(0L, .N-1)), by=rleid(key)]
df
   key values
1:   A    118
2:   B    118
3:   B      0
4:   C    100
5:   A     90
6:   A      0
7:   A      0

c(head(values, 1), rep(0L, .N-1)) takes the first element of values and prepends this to a vector of 0s the length of the groups minus 1.

like image 30
lmo Avatar answered Oct 14 '22 19:10

lmo


You basically take cusmsum of the keys cols and then subtract an index of length of those key to give you a vector which should hold the values and you can convert everything else to 0.

i=cumsum(rle(as.character(df$key))$lengths)-(rle(as.character(df$key))$lengths-1)
df$values[-i]=0
like image 25
Chirayu Chamoli Avatar answered Oct 14 '22 20:10

Chirayu Chamoli