Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Classes in R from a python background

Tags:

oop

r

I am a python programmer, and this is my first day working with R.

I am trying to write a class with a constructor and three methods, and I am struggling.

In python, it's easy:

 class MyClass:
      def __init__(self):
           self.variableA = 1
           self.variableB = 2

      def hello(self):
           return "Hello"

      def goodbye(self):
           return "Goodbye"

      def ohDear(self):
           return "Have no clue"

I can't find anything that shows me how to do something as simple as this in R. I would appreciate it if someone could show me one way to do this ?

Many thanks

like image 887
Sherlock Avatar asked Jul 19 '12 12:07

Sherlock


3 Answers

R actually has lots of different object oriented implementations. The three native ones (as mentioned by @jthetzel) are S3, S4 and Reference Classes.

S3 is a lightweight system that allows you to overload a function based upon the class of the first argument.

Reference Classes are designed to more closely resemble classes from other programming languages. They more or less replace S4 classes, which do the same thing but in a more unwieldy fashion.

The R.oo package provides another system, and the proto package allows prototype-oriented programming, which is like lightweight OOP. There was a sixth system in the OOP package, but that is now defunct. The more recent R6 package "is a simpler, faster, lighter-weight alternative to R's built-in reference classes".

For new projects, you'll usually only want to use S3 and Reference classes (or possibly R6).

python classes most easily translate to reference classes. They are relatively new and (until John Chambers finishes his book on them) the best reference is the ?ReferenceClasses page. Here's an example to get you started.

To define a class, you call setRefClass. The first argument is the name of the class, and by convention this should be the same as the variable that you assign the result to. You also need to pass lists to the arguments "fields" and "methods".

There are a few quirks.

  • If you don't want to specify what variable type a field should have, pass "ANY" as the value in the list of fields.
  • Any constructor logic needs to be in an optional function called initialize.
  • If the first line of a method is a string, it is interpreted as documentation for that method.
  • Inside a method, if you want to assign to a field, use global assignment (<<-).

This creates a class generator:

MyClass <- setRefClass(
  "MyClass",
  fields = list(
    x = "ANY",
    y = "numeric",
    z = "character"
  ),
  methods = list(
    initialize = function(x = NULL, y = 1:10, z = letters)
    {
      "This method is called when you create an instance of the class."
      x <<- x
      y <<- y
      z <<- z
      print("You initialized MyClass!")
    },
    hello = function()
    {
      "This method returns the string 'hello'."
      "hello"
    },
    doubleY = function()
    {
      2 * y
    },
    printInput = function(input)
    {
      if(missing(input)) stop("You must provide some input.")
      print(input)
    }
  )
)

Then you create instances of the class by calling the generator object.

obj1 <- MyClass$new()
obj1$hello()
obj1$doubleY()

obj2 <- MyClass$new(x = TRUE, z = "ZZZ")
obj2$printInput("I'm printing a line!")

Further reading: the OO field guide chapter of Advanced R.

like image 58
Richie Cotton Avatar answered Oct 23 '22 21:10

Richie Cotton


I recently wrote up a comparison of python classes and R S4 classes, which can be found at:

http://practicalcomputing.org/node/80

Classes are very different in R and python, both in how they are declared, how they are used, and how they work.

Per mbinette's request in the comments, here is the full text of the post (minus most hyperlinks since I only have privileges for two):

To anyone that has programmed in python, C++, java, or other common object oriented language, object oriented programming in R can be quite confusing. Following in the spirit of the Rosetta code examples in the book, here I compare code that creates and uses a class in python to code that creates and uses a class in R.

The first layer of confusion is that R has several different systems for object oriented programing - S3, S4, and R5. The first decision one confronts is which of these to pick for your project. S3 has been around the longest, and is widely used. Its functionality is limited in some key respects, but the programmer has quite a bit of flexibility in how to code classes. S4, a newer system, addresses some of the limitations of S3. It is a bit more complicated and rigid to code, but ultimately more powerful to use. In general, folks use S3 when working on existing code that already has S3 objects, and S4 when implementing new code from scratch. Many newer bioconductor packages, for example, are written with S4. Hadley Wickham has an excellent summary of S3, S4, and R5, among other aspects of R, that are a great place to educate yourself more about object oriented programming in R.

Here I focus on the S4 system.

Below is the definition for a simple Circle class in python. It has an __init__() constructor method for setting values when new instances are created, some methods for setting values, some methods for getting values, and a method for modifying the class instance by calculating the diameter from the radius.

class Circle:

    ## Contents
    radius = None
    diameter = None

    ## Methods
    # Constructor for creating new instances
    def __init__(self, r):
        self.radius = r

    # Value setting methods
    def setradius(self, r):
        self.radius = r

    def setdiameter(self, d):
        self.diameter = d

    # Value getting methods
    def getradius(self):
        return(self.radius)

    def getdiameter(self):
        return(self.diameter)

    # Method that alters a value
    def calc_diameter(self):
        self.diameter = 2 * self.radius

Once you have created this class, creating and using an instance (in ipython) looks like this:

In [3]: c = Circle()

In [4]: c.setradius(2)

In [5]: c.calc_diameter()

In [6]: c.getradius()
Out[6]: 2

In [7]: c.getdiameter()
Out[7]: 4

The Circle() function creates a new instance of the class Circle using the constructor defined by __init__(). We use the .setradius() method to set the radius value, and the .calc_diameter() method to calculate the diameter from the radius and update the diameter value in the class instance. We then use the methods we built to get the values for the radius and diameter. We could also of course directly access the radius and diameter values, using the same dot notation that we used to call the functions:

In [8]: c.radius
Out[8]: 2

In [9]: c.diameter
Out[9]: 4

As in C++, java, and many other common languages, both methods and data variables are attributes of the class. In addition, methods have direct read and write access to data attributes. In this case the .calc_diameter() method replaced the diameter value with a new value, without needing to change anything else about the class instance.

Now for S4 objects in R, which are very, very different. Here is a similar Circle class in R:

setClass(
    Class = "Circle", 
    representation = representation(
        radius = "numeric", 
        diameter = "numeric"
    ),
)

# Value setting methods
# Note that the second argument to a function that is defined with setReplaceMethod() must be named value
setGeneric("radius<-", function(self, value) standardGeneric("radius<-"))
setReplaceMethod("radius", 
    "Circle", 
    function(self, value) {
        self@radius <- value
        self
    }
)

setGeneric("diameter<-", function(self, value) standardGeneric("diameter<-"))
setReplaceMethod("diameter", 
    "Circle", 
    function(self, value) {
        self@diameter <- value
        self
    }
)

# Value getting methods
setGeneric("radius", function(self) standardGeneric("radius"))
setMethod("radius", 
    signature(self = "Circle"), 
    function(self) {
        self@radius
    }
)

setGeneric("diameter", function(self) standardGeneric("diameter"))
setMethod("diameter", 
    signature(self = "Circle"), 
    function(self) {
        self@diameter
    }
)


# Method that calculates one value from another
setGeneric("calc_diameter", function(self) { standardGeneric("calc_diameter")})
setMethod("calc_diameter", 
    signature(self = "Circle"), 
    function(self) {
        self@diameter <- self@radius * 2
        self
    }
)

Once you have created this class, creating and using an instance (in the R interactive console) looks like this:

> a <- new("Circle")
> radius(a) <- 2
> a <- calc_diameter(a)
> radius(a)
[1] 2
> diameter(a)
[1] 4

The new("Circle") call created a new instance of the Circle class, which we assigned to a variable called a. The radius(a)<- 2 line created a copy of object a, updated the value of radius to 2, and then pointed a to the new updated object. This was accomplished with the radius<- method defined above.

We defined calc_diameter() as a method for the Circle class, but note that we DON'T call it as if it were an attribute of the class. That is, we don't use syntax like a.calc_diameter(). Instead, we call calc_diameter() just as we would any other stand-alone function, and we pass the object to the method as the first argument.

In addition, we didn't just call calc_diameter(a), we assigned the output back to a. This is because objects in R are passed to functions as values, not references. The function gets a copy of the object, not the original object. That copy is then manipulated within the function, and if you want the modified object back you have to do two things. First, the object has to be executed in the final line of the function (hence the lonely self lines in the method definitions). In R, this is like calling return(). Second, you have to copy the updated value back to our object variable when you call the method. This is why the full line is a <- calc_diameter(a).

The radius(a) and diameter(a) calls execute the methods we defined for returning these values.

You can also directly access data attributes of an object in R just as you can an object in python. Instead of using the dot notation, though, you use @ notation:

> a@radius
[1] 2
> a@diameter
[1] 4

In R, data attributes are referred to as "slots". The @ syntax gives you access to those data attributes. But what about methods? Unlike in python, in R methods are not attributes of objects, they are defined by setMethod() to act on particular objects. The class that the method acts on is determined by the signature argument. There can be more than one method with the same name, though, each acting on different classes. This is because the method that is called doesn't just depend on the method's name, it also depends on the type of the arguments. A familiar example is the method plot(). To the user it looks like there is one plot() function, but in fact there are many plot() methods that are each specific to a particular class. The method that is called depends on the class that is passed to plot().

This gets to the setGeneric() lines in the class definition. If you are defining a new method using a name that already exists (such as plot()), you don't need it. This is because setMethod() defines new versions of existing methods. The new versions take a different set of data types then the existing versions of the same name. If you are defining a function with a new name, though, you first have to declare the function. setGeneric() takes care of this declaration, creating what is essentially a placeholder that you then promptly override.

The differences between classes in python and R aren't just cosmetic, very different things are happening under the hood and classes are used in different ways in each language. There are a few things that are particularly frustrating about creating and using classes in R, though. Creating S4 classes in R takes a lot more typing, and much of it is redundant (for example, each method name had to be specified three times in the example above). Because R methods can only access data attributes by making a copy of the entire object, there can be a large performance hit for even simple manipulations once objects get large. This problem is compounded for methods that modify objects, since the data have to be copied once on the way in and then once on the way out. These issues have probably contributed to the recent rapid growth in popularity of python tools for numerical analysis, such as pandas. That said, R remains a powerful tool, it is well suited to many common problems, and the rich ecosystem of R libraries is indispensable for many analyses.

like image 24
Casey Dunn Avatar answered Oct 23 '22 21:10

Casey Dunn


I am coming from the Python world and at the beginning was hard to think in terms of R classes. Finally, I made it. I think.

I was able to convert Python and Java classes-like to S4 in R.

This package rODE was created with the specific purpose of helping the initiated to learn S4 classes. The package is about solving ordinary differential equations and contains a little bit of everything about S4 classes.

Hope it helps. Below the link:

https://github.com/f0nzie/rODE

The package is also in CRAN.

Now, on the answer. There are many ways of doing in R what you want. Here is the first way, using S4 classes and prototype to initialize the value of your variables A and B.

setClass("MyClass", slots = c(
    A = "numeric",
    B = "numeric"
    ),
    prototype = prototype(
        A = 1,
        B = 2
    )
)

# generic functions
setGeneric("hello", function(object, ...) standardGeneric("hello"))
setGeneric("goodbye", function(object, ...) standardGeneric("goodbye"))
setGeneric("ohDear", function(object, ...) standardGeneric("ohDear"))

# Methods of the class
setMethod("hello", "MyClass", function(object, ...) {
    return("Hello")
})    

setMethod("goodbye", "MyClass", function(object, ...) {
    return("Goodbye")
})

setMethod("ohDear", "MyClass", function(object, ...) {
    return("Have no clue")
})


# instantiate a class
mc <- new("MyClass")

# use the class methods
hello(mc)
goodbye(mc)
ohDear(mc)

The second way of doing this is using the initialize method.

setClass("MyClass", slots = c(
    A = "numeric",
    B = "numeric"
    )
)

# generic functions
setGeneric("hello", function(object, ...) standardGeneric("hello"))
setGeneric("goodbye", function(object, ...) standardGeneric("goodbye"))
setGeneric("ohDear", function(object, ...) standardGeneric("ohDear"))

# Methods of the class
setMethod("initialize", "MyClass", function(.Object, ...) {
    .Object@A <- 1
    .Object@B <- 2
    return(.Object)
})

# Methods of the class
setMethod("hello", "MyClass", function(object, ...) {
    return("Hello")
})    

setMethod("goodbye", "MyClass", function(object, ...) {
    return("Goodbye")
})

setMethod("ohDear", "MyClass", function(object, ...) {
    return("Have no clue")
})


# instantiate a class
mc <- new("MyClass")

# use the class methods
hello(mc)
goodbye(mc)
ohDear(mc)

mc@A       # get value on slot A
mc@B       # get value on slot B

A third way of doing this is by using a constructor and initializing the class variables outside the class with the constructor function:

setClass("MyClass", slots = c(
    A = "numeric",
    B = "numeric"
    )
)

# generic functions
setGeneric("hello", function(object, ...) standardGeneric("hello"))
setGeneric("goodbye", function(object, ...) standardGeneric("goodbye"))
setGeneric("ohDear", function(object, ...) standardGeneric("ohDear"))

# Methods of the class
setMethod("initialize", "MyClass", function(.Object, ...) {
    return(.Object)
})

# Methods of the class
setMethod("hello", "MyClass", function(object, ...) {
    return("Hello")
})    

setMethod("goodbye", "MyClass", function(object, ...) {
    return("Goodbye")
})

setMethod("ohDear", "MyClass", function(object, ...) {
    return("Have no clue")
})


# constructor function
MyClass <- function() {
    myclass <- new("MyClass")
    myclass@A <- 1            # assignment
    myclass@B <- 2
    return(myclass)           # return the class initialized
}

# instantiate a class
mc <- MyClass()

# use the class methods
hello(mc)
goodbye(mc)
ohDear(mc)

mc@A       # get value on slot A
mc@B       # get value on slot B

This still has room for improvement since we shouldn't be dealing with the raw name of the slots outside the class. Encapsulation, remember? Here is the fourth and better way using setReplaceMethod:

setClass("MyClass", slots = c(
    A = "numeric",
    B = "numeric"
    )
)

# generic functions
setGeneric("hello", function(object, ...) standardGeneric("hello"))
setGeneric("goodbye", function(object, ...) standardGeneric("goodbye"))
setGeneric("ohDear", function(object, ...) standardGeneric("ohDear"))
setGeneric("getA", function(object, ..., value) standardGeneric("getA"))
setGeneric("getB", function(object, ..., value) standardGeneric("getB"))
setGeneric("setA<-", function(object, ..., value) standardGeneric("setA<-"))
setGeneric("setB<-", function(object, ..., value) standardGeneric("setB<-"))

# Methods of the class
setMethod("initialize", "MyClass", function(.Object, ...) {
    return(.Object)
})

# Methods of the class
setMethod("hello", "MyClass", function(object, ...) {
    return("Hello")
})    

setMethod("goodbye", "MyClass", function(object, ...) {
    return("Goodbye")
})

setMethod("ohDear", "MyClass", function(object, ...) {
    return("Have no clue")
})

setMethod("getA", "MyClass", function(object, ...) {
    return(object@A)
})

setMethod("getB", "MyClass", function(object, ...) {
    return(object@B)
})

setReplaceMethod("setA", "MyClass", function(object, ..., value) {
   object@A <- value
   object
})

setReplaceMethod("setB", "MyClass", function(object, ..., value) {
   object@B <- value
   object
})

# constructor function
MyClass <- function() {
    myclass <- new("MyClass")
    return(myclass)           # return the class initialized
}

# instantiate a class
mc <- MyClass()

# use the class methods
hello(mc)
goodbye(mc)
ohDear(mc)

setA(mc) <- 1
setB(mc) <- 2

getA(mc)       # get value on slot A
getB(mc)       # get value on slot B

And the 5th way is by creating a class method to instantiate the class itself which is useful to validate the input even for missing parameters:

.MyClass <- setClass("MyClass", slots = c(
    A = "numeric",
    B = "numeric"
    )
)

# generic functions
setGeneric("hello", function(object, ...) standardGeneric("hello"))
setGeneric("goodbye", function(object, ...) standardGeneric("goodbye"))
setGeneric("ohDear", function(object, ...) standardGeneric("ohDear"))
setGeneric("getA", function(object, ..., value) standardGeneric("getA"))
setGeneric("getB", function(object, ..., value) standardGeneric("getB"))
setGeneric("setA<-", function(object, ..., value) standardGeneric("setA<-"))
setGeneric("setB<-", function(object, ..., value) standardGeneric("setB<-"))
setGeneric("MyClass", function(A, B, ...) standardGeneric("MyClass"))

# Methods of the class
setMethod("initialize", "MyClass", function(.Object, ...) {
    return(.Object)
})

# Methods of the class
setMethod("hello", "MyClass", function(object, ...) {
    return("Hello")
})    

setMethod("goodbye", "MyClass", function(object, ...) {
    return("Goodbye")
})

setMethod("ohDear", "MyClass", function(object, ...) {
    return("Have no clue")
})

setMethod("getA", "MyClass", function(object, ...) {
    return(object@A)
})

setMethod("getB", "MyClass", function(object, ...) {
    return(object@B)
})

setReplaceMethod("setA", "MyClass", function(object, ..., value) {
   object@A <- value
   object
})

setReplaceMethod("setB", "MyClass", function(object, ..., value) {
   object@B <- value
   object
})

setMethod("MyClass", signature(A="numeric", B="numeric"), function(A, B, ...) {
    myclass <- .MyClass()
    myclass@A <- A
    myclass@B <- B
    return(myclass)
})

# instantiate the class with values
mc <- MyClass(A = 1, B = 2)

# use the class methods
hello(mc)
goodbye(mc)
ohDear(mc)

getA(mc)       # get value on slot A
getB(mc)       # get value on slot B
like image 1
f0nzie Avatar answered Oct 23 '22 21:10

f0nzie