Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining or merging workspaces in R and general workspace management

I often find myself transferring workspaces to different scratch drives etc when one computing system is down/busy, or, I want to run two long-winded packages simultaneously to save time and loading the same workspace twice in different places.

Because of this, I'd really love a way to see the different objects between workspaces and a way to combine them, adding only the new, changed or updated workspace objects to a similar workspace. This would be extremely useful for me.

So far I am relying on manual note-taking and getting befuddled with my scribbles two weeks down the line. I really just want to learn so good working practices and habits that make this sort of this easier.

Generally I would really like to learn more about workspace management and how experienced users keep workspaces for long, ongoing projects comprehensive and tidy. I often use Rstudio but working remotely or using our HPC system it can be a bit laggy and clunky so I tend to use command line and interactive sessions.

I think maybe making lists of objects might be the key, but I'd like to be able to annotate things more easily, maybe with the data and parameters used to make the object etc.

Thanks.

like image 569
jksl Avatar asked Oct 06 '12 13:10

jksl


2 Answers

I think one needs to build ones own function here doing the following:

  • loading the workspaces one after one, using:

    load()
    
  • renaming each element of the workspace to prevent overriding when loading another workspace or putting it into a list

  • checking the timestamp of the workspaces with:

    file.info()
    
  • and keeping only the newest objects, which are then to be saved in some up-to-date workspace

Example:

for(i in 1:10){
    dummy <- rnorm(1)
    Sys.sleep(1.3)
    save(dummy,file=paste("test",i,".Rdata",sep=""))
}

DUMMY <- list()
timestamps <- NULL

for(i in 1:10){
    filename <- paste("test",i,".Rdata",sep="")
    load(filename)
    DUMMY[[i]] <- dummy
    timestamps[i] <- file.info(filename)$mtime
}

uptodate <- unlist(timestamps)==max(unlist(timestamps))
dummy <- unlist(DUMMY[uptodate])
save(dummy,file="uptodate.Rdata")
like image 196
petermeissner Avatar answered Nov 15 '22 01:11

petermeissner


I think the key thing is to load your workspaces into separate environments, then figure out how you want to merge them (if at all).

First, let's make some objects to save.

set.seed(1)
a <- data.frame(1:10, 1:10)
b <- rnorm(10)

One way to keep track of when an object was created, is to set an attribute. The downside is that you have to remember to update it when you update your object. (See the last part of the post for alternatives)

d <- structure(data.frame(b), updated=Sys.time())
attr(d, 'updated')
#[1] "2012-10-06 12:34:06 CDT"

You can assign the current time to a variable just before saving the workspace to know when you saved it (file.info that PeterM suggested may be a better alternative)

updated <- Sys.time() 
dir.create('~/tmp') # create a directory to save workspace in.
save.image('~/tmp/ws1.RData')

d[1, 1] <- 1 #make a change to `d`
attr(d, "updated") <- Sys.time() # don't forget to update the `updated` attribute
e <- b * a # add a new object
updated <- Sys.time()
save.image('~/tmp/ws2.RData')

Now clear the workspace, and load the workspaces. But, instead of loading them into the .GlobalEnv, load them into their own environments

rm(list=ls(all=TRUE)) # clear .GlobalEnv
w1 <- new.env()
w2 <- new.env()
load('~/tmp/ws1.RData', envir=w1)
load('~/tmp/ws2.RData', envir=w2)

> ls(w1)
[1] "a"       "b"       "d"       "updated"
> ls(w2)
[1] "a"       "b"       "d"       "e"       "updated"

> with(w1, updated)
[1] "2012-10-06 12:34:09 CDT"
> with(w2, updated)
[1] "2012-10-06 12:35:02 CDT"

> attr(w1$d, 'updated')
[1] "2012-10-06 12:34:06 CDT"
> attr(w2$d, 'updated')
[1] "2012-10-06 12:35:02 CDT"

You may be interested in a function like .ls.objects

> .ls.objects(pos=w1)
              Type Size PrettySize Rows Columns
a       data.frame  872    [1] 872   10       2
b          numeric  168    [1] 168   10      NA
d       data.frame 1224   [1] 1224   10       1
updated    POSIXct  312    [1] 312    1      NA
> .ls.objects(pos=w2)
              Type Size PrettySize Rows Columns
a       data.frame  872    [1] 872   10       2
b          numeric  168    [1] 168   10      NA
d       data.frame 1224   [1] 1224   10       1
e       data.frame 1032   [1] 1032   10       2
updated    POSIXct  312    [1] 312    1      NA

You could use a custom wrapper around assign to keep track of when objects were updated.

myAssign <- function(x, value, ...) {
  attr(value, "updated") <- Sys.time()
  assign(x, value, ...)
}

> myAssign("b", w1$b[1:2], pos=w1)
> w1$b
[1] -0.6264538  0.1836433
attr(,"updated")
[1] "2012-10-06 12:44:55 CDT"

Finally, if you want to get fancy, you can make an active binding so that your object always gets an updated updated attribute whenever it changes.

f <- local({
  delayedAssign('x', stop('object not found'))
  function(v) {
    if (!missing(v)) x <<- structure(v, updated=Sys.time())
    x
  }
})
makeActiveBinding('ab', f, .GlobalEnv)
> ab # Error, nothing has been assigned to it yet
Error in function (v)  : object not found
> ab <- data.frame(1:10, y=rnorm(10))
> attr(ab, 'updated')
[1] "2012-10-06 12:46:53 CDT"
> ab <- data.frame(10:1, y=rnorm(10))
> attr(ab, 'updated')
[1] "2012-10-06 12:47:04 CDT"
like image 27
GSee Avatar answered Nov 15 '22 01:11

GSee