Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tryCatch - namespace?

I am quite new to R and I am confused by the correct usage of tryCatch. My goal is to make a prediction for a large data set. If the predictions cannot fit into memory, I want to circumvent the problem by splitting my data.

Right now, my code looks roughly as follows:

tryCatch({
  large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
  for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
    small_vector = predict(model, large_data_frame[i:(i+step-1), ])
    save(small_vector, tmpfile)
  }
  rm(large_data_frame) # free memory
  large_vector = NULL
  for (i in seq(from = 1, to = dim(large_data_frame)[1], by = 1000)) {
    load(tmpfile)
    unlink(tmpfile)
    large_vector = c(large_vector, small_vector)
  }
})

The point is that if no error occurs, large_vector is filled with my predictions as expected. If an error occurs, large_vector seems to exist only in the namespace of the error code - which makes sense because I declared it as a function. For the same reason, I get a warning saying that large_data_frame cannot be removed.

Unfortunately, this behavior is not what I want. I would want to assign the variable large_vector from within my error function. I figured that one possibility is to specify the environment and use assign. Thus, I would use the following statements in my error code:

rm(large_data_frame, envir = parent.env(environment()))
[...]
assign('large_vector', large_vector, parent.env(environment()))

However, this solution seems rather dirty to me. I wonder whether there is any possibility to achieve my goal with "clean" code?

[EDIT] There seems to be some confusion because I put the code above mainly to illustrate the problem, not to give a working example. Here's a minimal example that shows the namespace issue:

# Example 1 : large_vector fits into memory
rm(large_vector)
tryCatch({
  large_vector = rep(5, 1000)
}, error = function(e) {
  # do stuff to build the vector
  large_vector = rep(3, 1000)
})
print(large_vector)  # all 5

# Example 2 : pretend large_vector does not fit into memory; solution using parent environment
rm(large_vector)
tryCatch({ 
  stop();  # simulate error
}, error = function(e) {
  # do stuff to build the vector
  large_vector = rep(3, 1000)
  assign('large_vector', large_vector, parent.env(environment()))
})
print(large_vector)  # all 3

# Example 3 : pretend large_vector does not fit into memory; namespace issue
rm(large_vector)
tryCatch({ 
  stop();  # simulate error
}, error = function(e) {
  # do stuff to build the vector
  large_vector = rep(3, 1000)
})
print(large_vector)  # does not exist
like image 869
Jenny Avatar asked Feb 17 '23 06:02

Jenny


1 Answers

I would do something like this :

res <- tryCatch({
  large_vector = predict(model, large_data_frame)
}, error = function(e) { # I ran out of memory
  ll <- lapply(split(data,seq(1,nrow(large_data_frame),1000)),
         function(x)
             small_vector = predict(model, x))
  return(ll)
})
rm(large_data_frame)
if(is.list(ll)) 
  res <- do.call(rbind,res)

The idea is to return a list of predictions results if you run out of the memory.

NOTE, i am not sure of the result here, because we don't have a reproducible example.

like image 146
agstudy Avatar answered Feb 28 '23 04:02

agstudy