Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply a match and replace function over series of rows in a dataframe in order

Tags:

dataframe

r

apply

Starting dataframe

data_start <- data.frame(marker = c("yes","yes","no","yes","no"),
                         id_out = c(5,3,1,1,7), 
                         id_new = c(6,8,9,4,2))

> data_start
  marker id_out id_new
1    yes      5      6
2    yes      3      8
3     no      1      9
4    yes      1      4
5     no      7      2

Add three column headers with empty columns below. Attach the starting var1:var3 values.

data_start[,c("var1", "var2", "var3")] <- NA
vars <- c(5,3,1)
data_start[1, 4:6] <- vars

> data_start
  marker id_out id_new var1 var2 var3
1    yes      5      6    5    3    1
2    yes      3      8   NA   NA   NA
3     no      1      9   NA   NA   NA
4    yes      1      4   NA   NA   NA
5     no      7      2   NA   NA   NA

I would like to update my var1:var3 columns by applying a function to each row where IF marker = yes AND id_out matches ANY of the var1:var3, replace any of var1:var3 with id_new. I found this solution, but works for one line of code and still requires each new var1:var3 part of the row to update.

data_start[1, 4:6][data_start[1, 4:6] == data_start[1,"id_out"]] <- data_start[1,"id_new"]

Each row also depends on using the values from the above row before again applying the function.

The final output would look like this where the rows stay unchanged when the marker = no and each row is subsequently updated.

> data_final
  marker id_out id_new var1 var2 var3
1    yes      5      6    6    3    1
2    yes      3      8    6    8    1
3     no      1      9    6    8    1
4    yes      1      4    6    8    4
5     no      7      2    6    8    4
like image 370
panstotts Avatar asked May 17 '16 20:05

panstotts


1 Answers

This is possible to use with any number of columns and works with base R:

cols <- c("var1", "var2", "var3")

for(j in 1:length(cols)) {
  var <- cols[j]
  for(i in 1:nrow(data_start)){
    if(i > 1) {
      data_start[i, var] <- data_start[i-1, var]
    }
    if(data_start[i, "marker"] == "yes" & data_start[i, var] == data_start[i,"id_out"]) {
      data_start[i,var] <- data_start[i, "id_new"]
    } 
  }
}
like image 166
s-heins Avatar answered Nov 19 '22 16:11

s-heins