Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I get a habit of removing unused variables in R?

Currently I'm working with relatively large data files, and my computer is not a super computer. I'm creating many subsets of these data sets temporarily and don't remove them from workspace. Obviously those are making a clutter of many variables. But, is there any effect of having many unused variables on performance of R? (i.e. does memory of computer fill at some point?)
When writing code should I start a habit of removing unused variables? Does it worth it?

x <- rnorm(1e8)
y <- mean(x)
# After this point I will not use x anymore, but I will use y
# Should I add following line to my code? or 
#   Maybe there will not be any performance lag if I skip the following line:
rm(x)

I don't want to add another line to my code. Instead of my code to seem cluttered I prefer my workspace to be cluttered (if there will be no performance improvement).

like image 560
HBat Avatar asked Jun 20 '13 16:06

HBat


People also ask

Is it bad to have unused variables?

Unused variables are a waste of space in the source; a decent compiler won't create them in the object file. Unused parameters when the functions have to meet an externally imposed interface are a different problem; they can't be avoided as easily because to remove them would be to change the interface.

How do you avoid declared but not used?

Solution. As you are re-declaring & not using the str variables , the code is crashing. To solve this problem, you just need to assign new values to the global variable str instead of declaring them again, as shown below.


4 Answers

Yes, having unused objects will affect your performance, since R stores all its objects in memry. Obviously small objects will have negligible impact, and you mostly need to remove only the really big ones (data frames with millions of rows, etc) but having an uncluttered workspace won't hurt anything.

The only risk is removing something that you need later. Even when using a repo, as suggested, breaking stuff accidentally is something you want to avoid.

One way to get around these issues is to make extensive use of local. When you do a computation that scatters around lots of temporary objects, you can wrap it inside a local call, which will effectively dispose of those objects for you afterward. No more having to clean up lots of i, j, x, temp.var, and whatnot.

local({
    x <- something
    for(i in seq_along(obj))
        temp <- some_unvectorised function(obj[[i]], x)
        for(j in 1:temp)
            temp2 <- some_other_unvectorised_function(temp, j)
    # x, i, j, temp, temp2 only exist for the duration of local(...)
})
like image 176
Hong Ooi Avatar answered Sep 28 '22 03:09

Hong Ooi


Adding to the above suggestions, for assisting beginners like me, I would like to list steps to check on R memory:

  1. List the objects that are unused using ls().
  2. Check the objects of interest using object.size("Object_name")
  3. Remove unused/unnecessary objects using rm("Object_name")
  4. Use gc()
  5. Check memory cleared using memory.size()

In case, you are using a new session, use rm(list=ls()) followed by gc().

If one feels that the habit of removing unused variables, can be dangerous, it is always a good practice to save the objects into R images occasionally.

like image 22
KarthikS Avatar answered Sep 28 '22 04:09

KarthikS


I think it's a good programming practice to remove unused code, regardless of language.

It's also a good practice to use a version control system like Subversion or Git to track your change history. If you do that you can remove code without fear, because it's always possible to roll back to earlier versions if you need to.

That's fundamental to professional coding.

like image 33
duffymo Avatar answered Sep 28 '22 05:09

duffymo


Show distribution of the largest objects and return their names, based on @Peter Raynham:

memory.biggest.objects <- function(n=10) { # Show distribution of the largest objects and return their names
  Sizes.of.objects.in.mem <- sapply( ls( envir = .GlobalEnv), FUN = function(name) { object.size(get(name)) } );
  topX= sort(Sizes.of.objects.in.mem,decreasing=T)[1:n]

  Memorty.usage.stat =c(topX, 'Other' = sum(sort(Sizes.of.objects.in.mem,decreasing=T)[-(1:n)]))
  pie(Memorty.usage.stat, cex=.5, sub=make.names(date()))
  # wpie(Memorty.usage.stat, cex=.5 )
  # Use wpie if you have MarkdownReports, from https://github.com/vertesy/MarkdownReports
  print(topX)
  print("rm(list=c( 'objectA',  'objectB'))")
  # inline_vec.char(names(topX))
  # Use inline_vec.char if you have DataInCode, from https://github.com/vertesy/DataInCode
}
like image 27
bud.dugong Avatar answered Sep 28 '22 04:09

bud.dugong