In common programming languages like java each file normally corresponds to a class.
I just started with R. I'd like to build a little program and I wanted to create a certain file and directory structure like this
Main.R # the main control script MyClass.R # A class that is referenced from within Main.R ProcessData.R # Another class that uses an object of MyClass.R as input
So I'd like to do something like this (pseudo code):
Main.R
myc <- new MyClass # create a new instance of MyClass from within Main.R pd <- new ProcessData pd$processMyClass( myc ) # call a method in ProcessData that processes the myc object in some way
So this is rather abstract, but I just wanted to know if this is in principle possible in R.
UPDATE: I need to get more specific. Therefore the question: How would you translate the following java programm into an R program by maintaining the same number of file and structure of the following toy program?
Main.java:
public static void main( String[] args ) {
MyClass myc = new MyClass("SampleWord");
ProcessData pd = new ProcessData();
pd.processData( myc );
}
MyClass.java
class MyClass {
public String word;
public MyClass( String word ) {
this.word = word;
}
}
ProcessData.java
class ProcessData.java {
public void processData( MyClass myc ) {
System.out.println( "pd.processData = " + myc.word );
}
}
What is directory structure? The directory structure is the organization of files into a hierarchy of folders. It should be stable and scalable; it should not fundamentally change, only be added to. Computers have used the folder metaphor for decades as a way to help users keep track of where something can be found.
An R project enables your work to be bundled in a portable, self-contained folder. Within the project, all the relevant scripts, data files, figures/outputs, and history are stored in sub-folders and importantly - the working directory is the project's root folder.
Check out the three class systems in R, S3, S4, and Reference classes.
## S3 methods, Section 5 of
RShowDoc("R-lang")
## S4 classes
?Classes
?Methods
## Reference classes
?ReferenceClasses
With a Java background you'll be tempted to go with reference classes, but these have 'reference semantics' and action at a distance (changing one object changes another that refers to the same data), whereas most R users expect 'copy on change' semantics. One can make great progress with S3 classes, but a more disciplined approach would in my opinion adopt S4. Features of S4 will surprise you, in part because the class system is closer to common lisp object system than to java.
There are other opinions and options.
I'm not really sure what your design goal with `ProcessData' is; I would implement your two classes as a class, a generic, and a method for the generic that operates on the MyClass class.
## definition and 'low-level' constructor
.MyClass <- setClass("MyClass", representation(word="character"))
## definition of a generic
setGeneric("processData", function(x, ...) standardGeneric("processData"))
setMethod("processData", "MyClass", function(x, ...) {
cat("processData(MyClass) =", x@word, "\n")
})
This is complete and fully functional
> myClass <- .MyClass(word="hello world")
> processData(myClass)
processData(MyClass) = hello world
The three code lines might be placed in two files, "AllGenerics.R" and "MyClass.R" (including the method) or three files "AllGenerics.R", "AllClasses.R", "processData-methods.R" (note that methods are associated with generics, and dispatch on class).
One would normally add a more user-friendly constructor, e.g., providing hints to the user about expected data types or performing complex argument initialization steps
MyClass <- function(word=character(), ...)
{
.MyClass(word=word, ...)
}
Typically one wants a slot accesssor, rather than direct slot access. This can be a simple function (as illustrated) or a generic + method.
word <- function(x, ...) x@word
If the slot is to be updated, then one writes a replacement function or method. The function or method usually has three arguments, the object to be updated, possible additional arguments, and the value to update the object with. Here's a generic + method implementation
setGeneric("word<-", function(x, ..., value) standardGeneric("word<-"))
setReplaceMethod("word", c("MyClass", "character"), function(x, ..., value) {
## note double dispatch on x=MyClass, value=character
x@word <- value
x
})
A somewhat tricky alternative implementation is
setReplaceMethod("word", c("MyClass", "character"), function(x, ..., value) {
initialize(x, word=value)
})
which uses the initialize
generic and default method as a copy constructor; this can be efficient if updating multiple slots at the same time.
Because the class is seen by users, one wants to display it in a user-friendly way using a 'show' method, for which a generic (getGeneric("show")
) already exists
setMethod("show", "MyClass", function(object) {
cat("class:", class(object), "\n")
cat("word:", word(object), "\n")
})
And now our user session looks like
> myClass
class: MyClass
word: hello world
> word(myClass)
[1] "hello world"
> word(myClass) <- "goodbye world"
> processData(myClass)
processData(MyClass) = goodbye world
R works efficiently on vectors; S4 classes are no exception. So the design is that each slot of a class represents a column spanning many rows, rather than the element of a single row. We're expecting the slot 'word' to typically contain a vector of length much greater than 1, and for operations to act on all elements of the vector. So one would write methods with this in mind, e.g., modifying the show method to
setMethod("show", "MyClass", function(object) {
cat("class:", class(object), "\n")
cat("word() length:", length(word(object)), "\n")
})
Here are larger data objects (using files on my Linux system)
> amer <- MyClass(readLines("/usr/share/dict/american-english"))
> brit <- MyClass(readLines("/usr/share/dict/british-english"))
> amer
class: MyClass
word() length: 99171
> brit
class: MyClass
word() length: 99156
> sum(word(amer) %in% word(brit))
[1] 97423
> amer_uc <- amer ## no copy, but marked to be copied if either changed
> word(amer_uc) <- toupper(word(amer_uc)) ## two distinct objects
and all of this is quite performant.
Let's rewind to a simpler implementation of the S4 class, with direct slot access and no fancy constructors. Here's the American dictionary and a copy, transformed to upper case
.MyClass <- setClass("MyClass", representation(word="character"))
amer <- .MyClass(word=readLines("/usr/share/dict/american-english"))
amer_uc <- amer
amer_uc@word <- toupper(amer_uc@word)
Note that we've upper-cased amer_uc
but not amer
:
> amer@word[99 + 1:10]
[1] "Adana" "Adar" "Adar's" "Addams" "Adderley"
[6] "Adderley's" "Addie" "Addie's" "Addison" "Adela"
> amer_uc@word[99 + 1:10]
[1] "ADANA" "ADAR" "ADAR'S" "ADDAMS" "ADDERLEY"
[6] "ADDERLEY'S" "ADDIE" "ADDIE'S" "ADDISON" "ADELA"
This is really what R users are expecting -- I've created a separate object and modified it; the original object is unmodified. This is an assertion on my part; maybe I don't know what R users expect. I'm assuming an R user isn't really paying attention to the fact that this is a reference class, but thinks it's just another R object like an integer()
vector or a data.frame
or the return value of lm()
.
In contrast, here's a minimal implementation of a reference class, and similar operations
.MyRefClass <- setRefClass("MyRefClass", fields = list(word="character"))
amer <- .MyRefClass(word=readLines("/usr/share/dict/american-english"))
amer_uc <- amer
amer_uc$word <- toupper(amer_uc$word)
But now we've changed both amer
and amer_uc
! Completely expected by C or Java programmers, but not by R users.
> amer$word[99 + 1:10]
[1] "ADANA" "ADAR" "ADAR'S" "ADDAMS" "ADDERLEY"
[6] "ADDERLEY'S" "ADDIE" "ADDIE'S" "ADDISON" "ADELA"
> amer_uc$word[99 + 1:10]
[1] "ADANA" "ADAR" "ADAR'S" "ADDAMS" "ADDERLEY"
[6] "ADDERLEY'S" "ADDIE" "ADDIE'S" "ADDISON" "ADELA"
Reference Classes Below we attempt to replicate the java code in the question using R in as close a way as we can. In that respect of the three built in R class systems (S3, S4, Reference Classes) Reference Classes seems the closest to that style. Reference Classes is the most recent class system to be added to R and its rapid uptake may be due to Java programmers coming to R who are familiar with that style.
(If you create a package out of this then omit all the source statements.)
Main.R file:
source("MyClass.R")
source("ProcessData.R")
main <- function() {
myc <- new("MyClass", word = "SampleWord")
pd <- new("ProcessData")
cat("pd$processData =", pd$processData(myc), "\n")
}
MyClass.R file:
setRefClass("MyClass",
fields = list(word = "character")
)
ProcessData.R file:
setRefClass("ProcessData",
fields = list(myc = "MyClass"),
methods = list(
processData = function(myc) myc$word
)
)
To run:
source("Main.R")
main()
proto package The proto package implements the prototype model of object oriented programming that originated with the Self programming language and exists to some extent in javascript, Lua and is particularly the basis of io language. proto can readily emulate this style (as discussed in the Traits section of the proto vignette):
Main.R file:
source("MyClass.R")
source("ProcessData.R")
library(proto)
main <- function() {
myc <- MyClass$new("SampleWord")
pd <- ProcessData$new()
cat("pd$processData =", pd$processData(myc), "\n")
}
MyClass.R file:
MyClass <- proto(
new = function(., word) proto(word = word)
)
ProcessData.R file:
ProcessData <- proto(
new = function(.) proto(.),
processData = function(., myc) myc$word
)
To run:
source("Main.R")
main()
UPDATE: Added proto example.
UPDATE 2: Improved main
and MyClass
in the reference class example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With