Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File and directory structure of a r project

In common programming languages like java each file normally corresponds to a class.

I just started with R. I'd like to build a little program and I wanted to create a certain file and directory structure like this

Main.R # the main control script 
MyClass.R # A class that is referenced from within Main.R 
ProcessData.R # Another class that uses an object of MyClass.R as input

So I'd like to do something like this (pseudo code):

Main.R

myc <- new MyClass # create a new instance of MyClass from within Main.R
pd <- new ProcessData 
pd$processMyClass( myc ) # call a method in ProcessData that processes the myc object in some way

So this is rather abstract, but I just wanted to know if this is in principle possible in R.

UPDATE: I need to get more specific. Therefore the question: How would you translate the following java programm into an R program by maintaining the same number of file and structure of the following toy program?

Main.java:

public static void main( String[] args ) {
    MyClass myc = new MyClass("SampleWord");
    ProcessData pd = new ProcessData();
    pd.processData( myc );
}

MyClass.java

class MyClass {

    public String word;

    public MyClass( String word ) {
        this.word = word;
    }
}

ProcessData.java

class ProcessData.java {

    public void processData( MyClass myc ) {
        System.out.println( "pd.processData = " + myc.word );
    }

}
like image 375
toom Avatar asked Apr 20 '13 12:04

toom


People also ask

What is directory file structure?

What is directory structure? The directory structure is the organization of files into a hierarchy of folders. It should be stable and scalable; it should not fundamentally change, only be added to. Computers have used the folder metaphor for decades as a way to help users keep track of where something can be found.

What is AR project file?

An R project enables your work to be bundled in a portable, self-contained folder. Within the project, all the relevant scripts, data files, figures/outputs, and history are stored in sub-folders and importantly - the working directory is the project's root folder.


2 Answers

Class systems

Check out the three class systems in R, S3, S4, and Reference classes.

## S3 methods, Section 5 of
RShowDoc("R-lang")

## S4 classes
?Classes
?Methods

## Reference classes
?ReferenceClasses

With a Java background you'll be tempted to go with reference classes, but these have 'reference semantics' and action at a distance (changing one object changes another that refers to the same data), whereas most R users expect 'copy on change' semantics. One can make great progress with S3 classes, but a more disciplined approach would in my opinion adopt S4. Features of S4 will surprise you, in part because the class system is closer to common lisp object system than to java.

There are other opinions and options.

Basic implementation

I'm not really sure what your design goal with `ProcessData' is; I would implement your two classes as a class, a generic, and a method for the generic that operates on the MyClass class.

## definition and 'low-level' constructor
.MyClass <- setClass("MyClass", representation(word="character"))

## definition of a generic
setGeneric("processData", function(x, ...) standardGeneric("processData"))

setMethod("processData", "MyClass", function(x, ...) {
    cat("processData(MyClass) =", x@word, "\n")
})

This is complete and fully functional

> myClass <- .MyClass(word="hello world")
> processData(myClass)
processData(MyClass) = hello world 

The three code lines might be placed in two files, "AllGenerics.R" and "MyClass.R" (including the method) or three files "AllGenerics.R", "AllClasses.R", "processData-methods.R" (note that methods are associated with generics, and dispatch on class).

Additional implementation

One would normally add a more user-friendly constructor, e.g., providing hints to the user about expected data types or performing complex argument initialization steps

MyClass <- function(word=character(), ...)
{
    .MyClass(word=word, ...)
}

Typically one wants a slot accesssor, rather than direct slot access. This can be a simple function (as illustrated) or a generic + method.

word <- function(x, ...) x@word

If the slot is to be updated, then one writes a replacement function or method. The function or method usually has three arguments, the object to be updated, possible additional arguments, and the value to update the object with. Here's a generic + method implementation

setGeneric("word<-", function(x, ..., value) standardGeneric("word<-"))

setReplaceMethod("word", c("MyClass", "character"), function(x, ..., value) {
    ## note double dispatch on x=MyClass, value=character
    x@word <- value
    x
})

A somewhat tricky alternative implementation is

setReplaceMethod("word", c("MyClass", "character"), function(x, ..., value) {
    initialize(x, word=value)
})

which uses the initialize generic and default method as a copy constructor; this can be efficient if updating multiple slots at the same time.

Because the class is seen by users, one wants to display it in a user-friendly way using a 'show' method, for which a generic (getGeneric("show")) already exists

setMethod("show", "MyClass", function(object) {
    cat("class:", class(object), "\n")
    cat("word:", word(object), "\n")
})

And now our user session looks like

> myClass
class: MyClass 
word: hello world 
> word(myClass)
[1] "hello world"
> word(myClass) <- "goodbye world"
> processData(myClass)
processData(MyClass) = goodbye world

Efficiency

R works efficiently on vectors; S4 classes are no exception. So the design is that each slot of a class represents a column spanning many rows, rather than the element of a single row. We're expecting the slot 'word' to typically contain a vector of length much greater than 1, and for operations to act on all elements of the vector. So one would write methods with this in mind, e.g., modifying the show method to

setMethod("show", "MyClass", function(object) {
    cat("class:", class(object), "\n")
    cat("word() length:", length(word(object)), "\n")
})

Here are larger data objects (using files on my Linux system)

> amer <- MyClass(readLines("/usr/share/dict/american-english"))
> brit <- MyClass(readLines("/usr/share/dict/british-english"))
> amer
class: MyClass 
word() length: 99171 
> brit
class: MyClass 
word() length: 99156 
> sum(word(amer) %in% word(brit))
[1] 97423
> amer_uc <- amer  ## no copy, but marked to be copied if either changed
> word(amer_uc) <- toupper(word(amer_uc))  ## two distinct objects

and all of this is quite performant.

Hazards of reference class 'action-at-a-distance'

Let's rewind to a simpler implementation of the S4 class, with direct slot access and no fancy constructors. Here's the American dictionary and a copy, transformed to upper case

.MyClass <- setClass("MyClass", representation(word="character"))
amer <- .MyClass(word=readLines("/usr/share/dict/american-english"))
amer_uc <- amer
amer_uc@word <- toupper(amer_uc@word)

Note that we've upper-cased amer_uc but not amer:

> amer@word[99 + 1:10]
 [1] "Adana"      "Adar"       "Adar's"     "Addams"     "Adderley"  
 [6] "Adderley's" "Addie"      "Addie's"    "Addison"    "Adela"     
> amer_uc@word[99 + 1:10]
 [1] "ADANA"      "ADAR"       "ADAR'S"     "ADDAMS"     "ADDERLEY"  
 [6] "ADDERLEY'S" "ADDIE"      "ADDIE'S"    "ADDISON"    "ADELA"     

This is really what R users are expecting -- I've created a separate object and modified it; the original object is unmodified. This is an assertion on my part; maybe I don't know what R users expect. I'm assuming an R user isn't really paying attention to the fact that this is a reference class, but thinks it's just another R object like an integer() vector or a data.frame or the return value of lm().

In contrast, here's a minimal implementation of a reference class, and similar operations

.MyRefClass <- setRefClass("MyRefClass", fields = list(word="character"))
amer <- .MyRefClass(word=readLines("/usr/share/dict/american-english"))
amer_uc <- amer
amer_uc$word <- toupper(amer_uc$word)

But now we've changed both amer and amer_uc! Completely expected by C or Java programmers, but not by R users.

> amer$word[99 + 1:10]
 [1] "ADANA"      "ADAR"       "ADAR'S"     "ADDAMS"     "ADDERLEY"  
 [6] "ADDERLEY'S" "ADDIE"      "ADDIE'S"    "ADDISON"    "ADELA"     
> amer_uc$word[99 + 1:10]
 [1] "ADANA"      "ADAR"       "ADAR'S"     "ADDAMS"     "ADDERLEY"  
 [6] "ADDERLEY'S" "ADDIE"      "ADDIE'S"    "ADDISON"    "ADELA"     
like image 108
Martin Morgan Avatar answered Oct 08 '22 10:10

Martin Morgan


Reference Classes Below we attempt to replicate the java code in the question using R in as close a way as we can. In that respect of the three built in R class systems (S3, S4, Reference Classes) Reference Classes seems the closest to that style. Reference Classes is the most recent class system to be added to R and its rapid uptake may be due to Java programmers coming to R who are familiar with that style.

(If you create a package out of this then omit all the source statements.)

Main.R file:

source("MyClass.R")
source("ProcessData.R")

main <- function() {
    myc <- new("MyClass", word = "SampleWord")
    pd <- new("ProcessData")
    cat("pd$processData =", pd$processData(myc), "\n")
}

MyClass.R file:

setRefClass("MyClass", 
    fields = list(word = "character")
)

ProcessData.R file:

setRefClass("ProcessData",
    fields = list(myc = "MyClass"),
    methods = list(
        processData = function(myc) myc$word
    )
)

To run:

source("Main.R")
main()

proto package The proto package implements the prototype model of object oriented programming that originated with the Self programming language and exists to some extent in javascript, Lua and is particularly the basis of io language. proto can readily emulate this style (as discussed in the Traits section of the proto vignette):

Main.R file:

source("MyClass.R")
source("ProcessData.R")  

library(proto)

main <- function() {
    myc <- MyClass$new("SampleWord")
    pd <- ProcessData$new()
    cat("pd$processData =", pd$processData(myc), "\n")
}

MyClass.R file:

MyClass <- proto(
    new = function(., word) proto(word = word)
)

ProcessData.R file:

ProcessData <- proto(
    new = function(.) proto(.), 
    processData = function(., myc) myc$word
)

To run:

source("Main.R")
main()

UPDATE: Added proto example.

UPDATE 2: Improved main and MyClass in the reference class example.

like image 27
G. Grothendieck Avatar answered Oct 08 '22 10:10

G. Grothendieck