Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make slot to be filled with multiple same-type objects in R?

Tags:

object

class

r

slot

Let's say I want to define two classes classes, Sentence and Word. Each word object has a character string and a part of speech (pos). Each sentence contains some number of words and has an additional slot for data.

The Word class is straightforward to define.

wordSlots <- list(word = "character", pos = "character")
wordProto <- list(word = "", pos = "")
setClass("Word", slots = wordSlots, prototype = wordProto)    
Word <- function(word, pos) new("Word", word=word, pos=pos)

Now I want to make a Sentence class which can contain some Words and some numerical data.

If I define the Sentence class as so:

sentenceSlots <- list(words = "Word", stats = "numeric")
sentenceProto <- list(words = Word(), stats = 0)
setClass("Sentence", slots = sentenceSlots, prototype = sentenceProto)

Then the sentence can contain only one word. I could obviously define it with many slots, one for each word, but then it will be limited in length.

However, if I define the Sentence class like this:

sentenceSlots <- list(words = "list", stats = "numeric")
sentenceProto <- list(words = list(Word()), stats = 0)
setClass("Sentence", slots = sentenceSlots, prototype = sentenceProto)

it can contain as many words as I want, but the slot words can contain objects which are not of the class Word.

Is there a way to accomplish this? This would be similar to the C++ thing where you can have a vector of objects of the same type.

like image 382
Will Beason Avatar asked Sep 15 '14 19:09

Will Beason


People also ask

What is an S4 object in R?

The S4 system in R is a system for object oriented programing. Confusingly, R has support for at least 3 different systems for object oriented programming: S3, S4 and S5 (also known as reference classes).

What is S4 function?

S4 provides a formal approach to functional OOP. The underlying ideas are similar to S3 (the topic of Chapter 13), but implementation is much stricter and makes use of specialised functions for creating classes ( setClass() ), generics ( setGeneric() ), and methods ( setMethod() ).

What are the objects in R language?

Objects in R Objects are the instance of the class. Also, everything in R is an object and to know more look at Data types in R. They also can have their attributes like class, attributes,dimnnames, names, etc.


1 Answers

Remembering that R works well on vectors, a first step is to think of 'Words' rather than 'Word'

## constructor, accessors, subset (also need [[, [<-, [[<- methods)
.Words <- setClass("Words",
    representation(words="character", parts="character"))
words <- function(x) x@words
parts <- function(x) x@parts
setMethod("length", "Words", function(x) length(words(x)))
setMethod("[", c("Words", "ANY", "missing"), function(x, i, j, ...) {
    initialize(x, words=words(x)[i], parts=parts(x)[i], ...)
})

## validity
setValidity("Words", function(object) {
    if (length(words(object)) == length(parts(object)))
        NULL
    else
        "'words()' and 'parts()' are not the same length"
})

@nicola's suggestion that one have a list of words has been formalized in the IRanges package (actually, S4Vectors in the 'devel' / 3.0 branch of Bioconductor), where a 'SimpleList' takes the 'naive' approach of requiring all elements of the list to have the same class, whereas a 'CompressedList' has similar behavior but actually is implemented as a vector-like object (one with a length(), [, and [[ methods) that is 'partitioned' (either by end or width) into groups.

library(IRanges)
.Sentences = setClass("Sentences",
    contains="CompressedList",    
    prototype=c(elementType="Words"))

One would then write a more user-friendly constructor, but the basic functionality is

## 0 Sentences
.Sentences()
## 1 sentence of 0 words
.Sentences(unlistData=.Words(), partitioning=PartitioningByEnd(0))
## 3 sentences of 2, 0, and 3 words
s3 <- .Sentences(unlistData=.Words(words=letters[1:5], parts=LETTERS[1:5]), 
    partitioning=PartitioningByEnd(c(2, 2, 5)))

leading to

> s3[[1]]
An object of class "Words"
Slot "word":
[1] "a" "b"

Slot "part":
[1] "A" "B"

> s3[[2]]
An object of class "Words"
Slot "word":
character(0)

Slot "part":
character(0)

> s3[[3]]
An object of class "Words"
Slot "word":
[1] "c" "d" "e"

Slot "part":
[1] "C" "D" "E"

Notice that some typical operations are fast because they can operate on the 'unlisted' elements without creating or destroying S4 instances, e.g., coercing all 'words' to upper case

setMethod(toupper, "Words", function(x) { x@word <- toupper(x@word); x })
setMethod(toupper, "Sentences", function(x) relist(toupper(unlist(x)), x))

This is 'fast' for large collections of sentences because unlist / relist is really on a slot access and creation of a single instance of 'Words'. Scalable Genomics with R and Bioconductor outlines this and other strategies.

In an answer @nicola says 'R is not perfectly suited for OO programming style' but it's probably more helpful to realize that R's S4 object oriented style differs from C++ and Java, just as R differs from C. In particular it's really valuable to continue thinking in terms of vectors when working with S4 -- Words rather than Word, People rather than Person...

like image 141
Martin Morgan Avatar answered Sep 22 '22 10:09

Martin Morgan