I often find myself transferring workspaces to different scratch drives etc when one computing system is down/busy, or, I want to run two long-winded packages simultaneously to save time and loading the same workspace twice in different places.
Because of this, I'd really love a way to see the different objects between workspaces and a way to combine them, adding only the new, changed or updated workspace objects to a similar workspace. This would be extremely useful for me.
So far I am relying on manual note-taking and getting befuddled with my scribbles two weeks down the line. I really just want to learn so good working practices and habits that make this sort of this easier.
Generally I would really like to learn more about workspace management and how experienced users keep workspaces for long, ongoing projects comprehensive and tidy. I often use Rstudio but working remotely or using our HPC system it can be a bit laggy and clunky so I tend to use command line and interactive sessions.
I think maybe making lists of objects might be the key, but I'd like to be able to annotate things more easily, maybe with the data and parameters used to make the object etc.
Thanks.
I think one needs to build ones own function here doing the following:
loading the workspaces one after one, using:
load()
renaming each element of the workspace to prevent overriding when loading another workspace or putting it into a list
checking the timestamp of the workspaces with:
file.info()
and keeping only the newest objects, which are then to be saved in some up-to-date workspace
Example:
for(i in 1:10){
dummy <- rnorm(1)
Sys.sleep(1.3)
save(dummy,file=paste("test",i,".Rdata",sep=""))
}
DUMMY <- list()
timestamps <- NULL
for(i in 1:10){
filename <- paste("test",i,".Rdata",sep="")
load(filename)
DUMMY[[i]] <- dummy
timestamps[i] <- file.info(filename)$mtime
}
uptodate <- unlist(timestamps)==max(unlist(timestamps))
dummy <- unlist(DUMMY[uptodate])
save(dummy,file="uptodate.Rdata")
I think the key thing is to load your workspaces into separate environments, then figure out how you want to merge them (if at all).
First, let's make some objects to save.
set.seed(1)
a <- data.frame(1:10, 1:10)
b <- rnorm(10)
One way to keep track of when an object was created, is to set an attribute. The downside is that you have to remember to update it when you update your object. (See the last part of the post for alternatives)
d <- structure(data.frame(b), updated=Sys.time())
attr(d, 'updated')
#[1] "2012-10-06 12:34:06 CDT"
You can assign the current time to a variable just before saving the workspace to know when you saved it (file.info
that PeterM suggested may be a better alternative)
updated <- Sys.time()
dir.create('~/tmp') # create a directory to save workspace in.
save.image('~/tmp/ws1.RData')
d[1, 1] <- 1 #make a change to `d`
attr(d, "updated") <- Sys.time() # don't forget to update the `updated` attribute
e <- b * a # add a new object
updated <- Sys.time()
save.image('~/tmp/ws2.RData')
Now clear the workspace, and load the workspaces. But, instead of loading them
into the .GlobalEnv
, load them into their own environments
rm(list=ls(all=TRUE)) # clear .GlobalEnv
w1 <- new.env()
w2 <- new.env()
load('~/tmp/ws1.RData', envir=w1)
load('~/tmp/ws2.RData', envir=w2)
> ls(w1)
[1] "a" "b" "d" "updated"
> ls(w2)
[1] "a" "b" "d" "e" "updated"
> with(w1, updated)
[1] "2012-10-06 12:34:09 CDT"
> with(w2, updated)
[1] "2012-10-06 12:35:02 CDT"
> attr(w1$d, 'updated')
[1] "2012-10-06 12:34:06 CDT"
> attr(w2$d, 'updated')
[1] "2012-10-06 12:35:02 CDT"
You may be interested in a function like .ls.objects
> .ls.objects(pos=w1)
Type Size PrettySize Rows Columns
a data.frame 872 [1] 872 10 2
b numeric 168 [1] 168 10 NA
d data.frame 1224 [1] 1224 10 1
updated POSIXct 312 [1] 312 1 NA
> .ls.objects(pos=w2)
Type Size PrettySize Rows Columns
a data.frame 872 [1] 872 10 2
b numeric 168 [1] 168 10 NA
d data.frame 1224 [1] 1224 10 1
e data.frame 1032 [1] 1032 10 2
updated POSIXct 312 [1] 312 1 NA
You could use a custom wrapper around assign
to keep track of when objects
were updated.
myAssign <- function(x, value, ...) {
attr(value, "updated") <- Sys.time()
assign(x, value, ...)
}
> myAssign("b", w1$b[1:2], pos=w1)
> w1$b
[1] -0.6264538 0.1836433
attr(,"updated")
[1] "2012-10-06 12:44:55 CDT"
Finally, if you want to get fancy, you can make an active binding so that your object
always gets an updated updated
attribute whenever it changes.
f <- local({
delayedAssign('x', stop('object not found'))
function(v) {
if (!missing(v)) x <<- structure(v, updated=Sys.time())
x
}
})
makeActiveBinding('ab', f, .GlobalEnv)
> ab # Error, nothing has been assigned to it yet
Error in function (v) : object not found
> ab <- data.frame(1:10, y=rnorm(10))
> attr(ab, 'updated')
[1] "2012-10-06 12:46:53 CDT"
> ab <- data.frame(10:1, y=rnorm(10))
> attr(ab, 'updated')
[1] "2012-10-06 12:47:04 CDT"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With