Let's say I want to define two classes classes, Sentence
and Word
. Each word object has a character string and a part of speech (pos). Each sentence contains some number of words and has an additional slot for data.
The Word
class is straightforward to define.
wordSlots <- list(word = "character", pos = "character")
wordProto <- list(word = "", pos = "")
setClass("Word", slots = wordSlots, prototype = wordProto)
Word <- function(word, pos) new("Word", word=word, pos=pos)
Now I want to make a Sentence
class which can contain some Word
s and some numerical data.
If I define the Sentence
class as so:
sentenceSlots <- list(words = "Word", stats = "numeric")
sentenceProto <- list(words = Word(), stats = 0)
setClass("Sentence", slots = sentenceSlots, prototype = sentenceProto)
Then the sentence can contain only one word. I could obviously define it with many slots, one for each word, but then it will be limited in length.
However, if I define the Sentence
class like this:
sentenceSlots <- list(words = "list", stats = "numeric")
sentenceProto <- list(words = list(Word()), stats = 0)
setClass("Sentence", slots = sentenceSlots, prototype = sentenceProto)
it can contain as many words as I want, but the slot words
can contain objects which are not of the class Word
.
Is there a way to accomplish this? This would be similar to the C++ thing where you can have a vector of objects of the same type.
The S4 system in R is a system for object oriented programing. Confusingly, R has support for at least 3 different systems for object oriented programming: S3, S4 and S5 (also known as reference classes).
S4 provides a formal approach to functional OOP. The underlying ideas are similar to S3 (the topic of Chapter 13), but implementation is much stricter and makes use of specialised functions for creating classes ( setClass() ), generics ( setGeneric() ), and methods ( setMethod() ).
Objects in R Objects are the instance of the class. Also, everything in R is an object and to know more look at Data types in R. They also can have their attributes like class, attributes,dimnnames, names, etc.
Remembering that R works well on vectors, a first step is to think of 'Words' rather than 'Word'
## constructor, accessors, subset (also need [[, [<-, [[<- methods)
.Words <- setClass("Words",
representation(words="character", parts="character"))
words <- function(x) x@words
parts <- function(x) x@parts
setMethod("length", "Words", function(x) length(words(x)))
setMethod("[", c("Words", "ANY", "missing"), function(x, i, j, ...) {
initialize(x, words=words(x)[i], parts=parts(x)[i], ...)
})
## validity
setValidity("Words", function(object) {
if (length(words(object)) == length(parts(object)))
NULL
else
"'words()' and 'parts()' are not the same length"
})
@nicola's suggestion that one have a list of words has been formalized in the IRanges package (actually, S4Vectors in the 'devel' / 3.0 branch of Bioconductor), where a 'SimpleList' takes the 'naive' approach of requiring all elements of the list to have the same class, whereas a 'CompressedList' has similar behavior but actually is implemented as a vector-like object (one with a length(), [, and [[ methods) that is 'partitioned' (either by end or width) into groups.
library(IRanges)
.Sentences = setClass("Sentences",
contains="CompressedList",
prototype=c(elementType="Words"))
One would then write a more user-friendly constructor, but the basic functionality is
## 0 Sentences
.Sentences()
## 1 sentence of 0 words
.Sentences(unlistData=.Words(), partitioning=PartitioningByEnd(0))
## 3 sentences of 2, 0, and 3 words
s3 <- .Sentences(unlistData=.Words(words=letters[1:5], parts=LETTERS[1:5]),
partitioning=PartitioningByEnd(c(2, 2, 5)))
leading to
> s3[[1]]
An object of class "Words"
Slot "word":
[1] "a" "b"
Slot "part":
[1] "A" "B"
> s3[[2]]
An object of class "Words"
Slot "word":
character(0)
Slot "part":
character(0)
> s3[[3]]
An object of class "Words"
Slot "word":
[1] "c" "d" "e"
Slot "part":
[1] "C" "D" "E"
Notice that some typical operations are fast because they can operate on the 'unlisted' elements without creating or destroying S4 instances, e.g., coercing all 'words' to upper case
setMethod(toupper, "Words", function(x) { x@word <- toupper(x@word); x })
setMethod(toupper, "Sentences", function(x) relist(toupper(unlist(x)), x))
This is 'fast' for large collections of sentences because unlist / relist is really on a slot access and creation of a single instance of 'Words'. Scalable Genomics with R and Bioconductor outlines this and other strategies.
In an answer @nicola says 'R is not perfectly suited for OO programming style' but it's probably more helpful to realize that R's S4 object oriented style differs from C++ and Java, just as R differs from C. In particular it's really valuable to continue thinking in terms of vectors when working with S4 -- Words rather than Word, People rather than Person...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With