Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamically update input dataframe at each iteration of function without global assignment

I have (1) a reference table of ratings, and (2) a function which randomly generates results based on these ratings and updates the ratings based upon the generated result.

Although there are easier solutions to the reproducible example below, the intended application is to simulate results between opponents based upon their Elo ratings, with ratings being updated after each round in order to run the simulations 'hot'.

Here, I have a reference table of ratings ref and use the function genResult to generate a random result and update the reference table using global assignment.

set.seed(123)
ref <- data.frame(id = LETTERS[1:5],
                  rating = round(runif(5, 100, 200)))

genResult <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  # assign('ref', ref, envir=.GlobalEnv)
  ref <<- ref

  return(list(result_i, ref))
}

Replicating this function twice, we can see ref is updated as expected.

replicate(2, genResult(ref), simplify = F)

Returning this, where we can see reference table is updated in each of the two iterations.

[[1]]
[[1]][[1]]
id score
1  A     1

[[1]][[2]]
id rating
1  A    130
2  B    179
3  C    141
4  D    188
5  E    194


[[2]]
[[2]][[1]]
id score
1  C    -2

[[2]][[2]]
id rating
1  A    130
2  B    179
3  C    139
4  D    188
5  E    194

Now let's say I want to replicate the above (replicated) function; simulating 3 separate instances of 5 results with dynamically updated ratings and outputting only the results. I make the reference table ref again and define a similar function which uses global assignment:

set.seed(123)
ref <- data.frame(id = LETTERS[1:5],
                  rating = round(runif(5, 100, 200)))

genResult2 <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  ref <<- ref

  return(result_i)
}

Then use an apply loop and collapse the list of results to a dataframe:

lapply(1:3, function(i) {

  ref_i <- ref

  replicate(5, genResult2(ref_i), simplify = F) %>% 
    plyr::rbind.fill() %>% 
    mutate(i)

}) %>% 
  plyr::rbind.fill()

Returning:

id score i
1   A     1 1
2   C    -2 1
3   B     9 1
4   A    26 1
5   A    -9 1
6   D    10 2
7   D     8 2
8   C     5 2
9   A    36 2
10  C    17 2
11  B    14 3
12  B   -15 3
13  B    -4 3
14  A   -22 3
15  B   -13 3

Now this seems to work as expected, but (i) it feels really ugly, and (ii) I've read countless times that global assignment can and will cause unexpected injury.

Can anyone suggest a better solution?

like image 464
jogall Avatar asked Jun 19 '18 14:06

jogall


2 Answers

If you're iterating and that the next iteration is dependent on the last it's often a good sign that you should use old fashioned for loop and not replicate or apply functions (Another possibility would have been to use Reduce with accumulate parameter set to TRUE).

This gives the same ouput as the code you posted, I used a for loop and made your function return ref as well:

genResult3 <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  #ref <<- ref

  return(list(result_i,ref)) # added ref to output
}

lapply(1:3, function(i) {
  res <- list(5)
  for (k in 1:5){
    gr <- genResult3(ref)
    res[[k]] <- gr[[1]] # update rating
    ref      <- gr[[2]] # get result
    res[[k]] <- left_join(res[[k]], ref, by = "id") # combine for output
  }
    plyr::rbind.fill(res) %>% 
    mutate(i)

}) %>% 
  plyr::rbind.fill()

Returning:

   id score rating i
1   A     1    130 1
2   C    -2    139 1
3   B     9    188 1
4   A    26    156 1
5   A    -9    147 1
6   D    10    198 2
7   D     8    206 2
8   C     5    146 2
9   A    36    165 2
10  C    17    163 2
11  B    14    193 3
12  B   -15    178 3
13  B    -4    174 3
14  A   -22    107 3
15  B   -13    161 3
like image 92
Moody_Mudskipper Avatar answered Nov 17 '22 07:11

Moody_Mudskipper


You can create a new environment with new.env() and do the calculations there:

Applying that idea to your first function gives this:

set.seed(123)
ref1 <- data.frame(id = LETTERS[1:5],
                  rating = round(runif(5, 100, 200)))
ref1

refEnv <- new.env()
refEnv$ref = ref1

genResult <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  assign('ref', ref, envir=refEnv)

  return(list(result_i, ref))
}
replicate(2, genResult(refEnv$ref), simplify = F)

ref1
refEnv$ref

You will see that the original ref1 is not touched and remains the same, while refEnv$ref contains the result from the last iteration.

And implementing that to your second function with lapply:

set.seed(123)
ref1 <- data.frame(id = LETTERS[1:5],
                   rating = round(runif(5, 100, 200)))
ref1

refEnv <- new.env()
refEnv$ref = ref1


genResult2 <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  assign('ref', ref, envir=refEnv)

  return(result_i)
}

# Replicating this function twice, we can see `ref` is updated as expected.    
lapply(1:3, function(i) {

  replicate(5, genResult2(refEnv$ref), simplify = F) %>% 
    plyr::rbind.fill() %>% 
    mutate(i)

}) %>% 
  plyr::rbind.fill()

ref1
like image 27
SeGa Avatar answered Nov 17 '22 09:11

SeGa