Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does R store the loop variable/index/dummy in memory?

Tags:

for-loop

r

I've noticed that R keeps the index from for loops stored in the global environment, e.g.:

for (ii in 1:5){ }

print(ii)
# [1] 5

Is it common for people to have any need for this index after running the loop?

I never use it, and am forced to remember to add rm(ii) after every loop I run (first, because I'm anal about keeping my namespace clean and second, for memory, because I sometimes loop over lists of data.tables--in my code right now, I have 357MB-worth of dummy variables wasting space).

Is there an easy way to get around this annoyance? Perfect would be a global option to set (a la options(keep_for_index = FALSE); something like for(ii in 1:5, keep_index = FALSE) could be acceptable as well.

like image 953
MichaelChirico Avatar asked Apr 14 '15 23:04

MichaelChirico


2 Answers

I agree with the comments above. Even if you have to use for loop (using just side effects, not functions' return values) it would be a good idea to structure your code in several functions and store your data in lists.

However, there is a way to "hide" index and all temporary variables inside the loop - by calling the for function in a separate environment:

do.call(`for`, alist(i, 1:3, {
  # ...
  print(i)
  # ... 
}), envir = new.env())

But ... if you could put your code in a function, the solution is more elegant:

for_each <- function(x, FUN) {
  for(i in x) {
    FUN(i)
  }
}

for_each(1:3, print)

Note that with using "for_each"-like construct you don't even see the index variable.

like image 89
bergant Avatar answered Nov 15 '22 07:11

bergant


In order to do what you suggest, R would have to change the scoping rules for for loops. This will likely never happen because i'm sure there is code out there in packages that rely on it. You may not use the index after the for loop, but given that loops can break() at any time, the final iteration value isn't always known ahead of time. And having this as a global option again would cause problems with existing code in working packages.

As pointed out, it's for more common to use sapply or lapply loops in R. Something like

for(i in 1:4) {
   lm(data[, 1] ~ data[, i])
}

becomes

sapply(1:4, function(i) {
   lm(data[, 1] ~ data[, i])
})

You shouldn't be afraid of functions in R. After all, R is a functional language.

It's fine to use for loops for more control, but you will have to take care of removing the indexing variable with rm() as you've pointed out. Unless you're using a different indexing variable in each loop, i'm surprised that they are piling up. I'm also surprised that in your case, if they are data.tables, they they are adding additional memory since data.tables don't make deep copies by default as far as i know. The only memory "price" you would pay is a simple pointer.

like image 7
MrFlick Avatar answered Nov 15 '22 06:11

MrFlick