Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - store functions in a data.frame

I would like to return a matrix/data.frame each row containing arguments and the content of a file.

However, there may be many files, so I would prefer if I could load the file lazily, so the file is only read if the actual content is requested. The function below loads the files actively if as.func=F.

It would be perfect if it could load them lazily, but it would also be acceptable, if instead of the content a function is returned that would read the content.

I can make functions that read the content (see below with as.func=T), but for some reason I cannot put that into the data.frame to return.

load_parallel_results <- function(resdir,as.func=F) {
  ## Find files called .../stdout                                                                                                                                                                              
  stdoutnames <- list.files(path=resdir, pattern="stdout", recursive=T);
  ## Find files called .../stderr                                                                                                                                                                              
  stderrnames <- list.files(path=resdir, pattern="stderr", recursive=T);
  if(as.func) {
    ## Create functions to read them                                                                                                                                                                           
    stdoutcontents <-
      lapply(stdoutnames, function(x) { force(x); return(function() { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } ) } );
    stderrcontents <-
      lapply(stderrnames, function(x) { force(x); return(function() { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } ) } );
  } else {
    ## Read them                                                                                                                                                                                               
    stdoutcontents <-
      lapply(stdoutnames, function(x) { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } );
    stderrcontents <-
      lapply(stderrnames, function(x) { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } );
  }
  if(length(stdoutnames) == 0) {
    ## Return empty data frame if no files found                                                                                                                                                               
    return(data.frame());
  }

  ## Make the columns containing the variable values                                                                                                                                                           
  m <- matrix(unlist(strsplit(stdoutnames, "/")),nrow = length(stdoutnames),byrow=T);
  mm <- as.data.frame(m[,c(F,T)]);
  ## Append the stdout and stderr column                                                                                                                                                                       
  mmm <- cbind(mm,unlist(stdoutcontents),unlist(stderrcontents));
  colnames(mmm) <- c(strsplit(stdoutnames[1],"/")[[1]][c(T,F)],"stderr");
  ## Example:                                                                                                                                                                                                  
  ## parallel --results my/res/dir --header : 'echo {};seq {myvar1}' ::: myvar1 1 2 ::: myvar2 A B                                                                                                             

  ##  > load_parallel_results("my/res/dir")                                                                                                                                                                    
  ##       myvar1 myvar2 stdout      stderr                                                                                                                                                                    
  ##  [1,] "1"    "A"    "1 A\n1"    ""                                                                                                                                                                        
  ##  [2,] "1"    "B"    "1 B\n1"    ""                                                                                                                                                                        
  ##  [3,] "2"    "A"    "2 A\n1\n2" ""                                                                                                                                                                        
  ##  [4,] "2"    "B"    "2 B\n1\n2" ""                                                                                                                                                                        
  return(mmm);
}

Background

GNU Parallel has a --results option that stores output in a structured way. If there are 1000000 outputfiles it may be hard to manage them. R is good for that, but it would be awfully slow if you had to read all 1000000 files just to get the ones where argument 1 = "Foo" and argument 2 = "Bar".

like image 640
Ole Tange Avatar asked Jan 04 '14 15:01

Ole Tange


3 Answers

Unfortunately I don't think you can save a function in a data.frame column. But you could store the deparsed text of the function and evaluate it when needed:

e.g.

myFunc <- function(x) { print(x) }
# convert the function to text
funcAsText <- deparse(myFunc)

# convert the text back to a function
newMyFunc <- eval(parse(text=funcAsText))

# now you can use the function newMyFunc exactly like myFunc
newMyFunc("foo")

> [1] "foo"

EDIT:
Since the files are a lot, I suggest you to simply store a string indicating the type of the file and create a function that understands the types and reads the file accordingly; so you can call it when needed by passing the type and filepath.

like image 51
digEmAll Avatar answered Oct 08 '22 17:10

digEmAll


(Without reading the question body:)

You can store functions in a data.frame like this:

df <- data.frame(fun = 1:3)
df$fun <- c(mean, sd, function(x) x^2)

I am not sure if this will break other things, so consider using tibble or data.table from the same named packages which really support arbitrary object types.

like image 30
jan-glx Avatar answered Oct 08 '22 19:10

jan-glx


You can use 2D lists to store your functions. Obviously, you lose some of the checks you get with DFs, but that's the whole point here:

> funs <- c(replicate(5, function(x) NULL), replicate(5, function(y) TRUE))
> names <- as.list(letters[1:10])
> # df doesn't work
> df <- data.frame(names=names)
> df.2 <- cbind(df, funs)
Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class ""function"" to a data.frame
# but 2d lists do
> lst.2d <- cbind(funs, names)
> lst.2d[2, 1]
$funs
function (x) 
  NULL
> lst.2d[6, 1]
$funs
function (y) 
  TRUE
like image 45
BrodieG Avatar answered Oct 08 '22 17:10

BrodieG