Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

S3 and order of classes

Tags:

r

glmnet

I've alway had trouble understanding the the documentation on how S3 methods are called, and this time it's biting me back.

I'll apologize up front for asking more than one question, but they are all closely related. Deep in the heart of a complex set of functions, I create a lot of glmnet fits, in particular logistic ones. Now, glmnet documentation specifies its return value to have both classes "glmnet" and (for logistic regression) "lognet". In fact, these are specified in this order.

However, looking at the end of the implementation of glmnet, righter after the call to (the internal function) lognet, that sets the class of fit to "lognet", I see this line of code just before the return (of the variable fit):

class(fit) = c(class(fit), "glmnet")

From this, I would conclude that the order of the classes is in fact "lognet", "glmnet".

Unfortunately, the fit I had, had (like the doc suggests):

> class(myfit)
[1] "glmnet" "lognet"

The problem with this is the way S3 methods are dispatched for it, in particular predict. Here's the code for predict.lognet:

function (object, newx, s = NULL, type = c("link", "response", 
    "coefficients", "class", "nonzero"), exact = FALSE, offset, 
    ...) 
{
    type = match.arg(type)
    nfit = NextMethod("predict") #<- supposed to call predict.glmnet, I think
    switch(type, response = {
        pp = exp(-nfit)
        1/(1 + pp)
    }, class = ifelse(nfit > 0, 2, 1), nfit)
}

I've added a comment to explain my reasoning. Now when I call predict on this myfit with a new datamatrix mydata and type="response", like this:

predict(myfit, newx=mydata, type="response")

, I do not, as per the documentation, get the predicted probabilities, but the linear combinations, which is exactly the result of calling predict.glmnet immediately.

I've tried reversing the order of the classes, like so:

orgclass<-class(myfit)
class(myfit)<-rev(orgclass)

And then doing the predict call again: lo and behold: it works! I do get the probabilities.

So, here come some questions:

  1. Am I right in 'having learned' that S3 methods are dispatched in order of appearance of the classes?
  2. Am I right in assuming the code in glmnetwould cause the wrong order for correct dispatching of predict?
  3. In my code there is nothing that manipulates classes explicitly/visibly to my knowledge. What could cause the order to change?

For completeness' sake: here's some sample code to play around with (as I'm doing myself now):

library(glmnet)
y<-factor(sample(2, 100, replace=TRUE))
xs<-matrix(runif(100), ncol=1)
colnames(xs)<-"x"
myfit<-glmnet(xs, y, family="binomial")
mydata<-matrix(runif(10), ncol=1)
colnames(mydata)<-"x"
class(myfit)
predict(myfit, newx=mydata, type="response")
class(myfit)<-rev(class(myfit))
class(myfit)
predict(myfit, newx=mydata, type="response")
class(myfit)<-rev(class(myfit))#set it back
class(myfit)

Depending on the data generated, the difference is more or less obvious (in my true dataset I noticed negative values in the so called probabilities, which is how I picked up the problem), but you should indeed see a difference.

Thanks for any input.

Edit:

I just found out the horrible truth: either order worked in glmnet 1.5.2 (which is present on the server where I ran the actual code, resulting in the fit with the class order reversed), but the code from 1.6 requires the order to be "lognet", "glmnet". I have yet to check what happens in 1.7.

Thanks to @Aaron for reminding me of the basics of informatics (besides 'if all else fails, restart': 'check your versions'). I had mistakenly assumed that a package by the gods of statistical learning would be protected from this type of error), and to @Gavin for confirming my reconstruction of how S3 works.

like image 523
Nick Sabbe Avatar asked Jun 23 '11 22:06

Nick Sabbe


1 Answers

Yes, the order of dispatch is in the order in which the classes are listed in the class attribute. In the simple, every-day case, yes, the first stated class is the one chosen first by method dispatch, and only if it fails to find a method for that class (or NextMethod is called) will it move on to the second class, or failing that search for a default method.

No, I do not think you are right that the order of the classes is wrong in the code. The documentation appears wrong. The intent is clearly to call predict.lognet() first, use the workhorse predict.glmnet() to do the basic computations for all types of lasso/elastic net models fitted by glmnet, and finally do some post processing of those general predictions. That predict.glmnet() is not exported from the glmnet NAMESPACE whilst the other methods are is perhaps telling, also.

I'm not sure why you think the output from this:

predict(myfit, newx=mydata, type="response")

is wrong? I get a matrix of 10 rows and 21 columns, with the columns relating to the intercept-only model prediction plus predictions at 20 values of lambda at which model coefficients along the lasso/elastic net path have been computed. These do not seem to be linear combinations and are one the response scale as you requested.

The order of the classes is not changing. I think you are misunderstanding how the code is supposed to work. There is a bug in the documentation, as the ordering is stated wrong there. But the code is working as I think it should.

like image 174
Gavin Simpson Avatar answered Sep 27 '22 19:09

Gavin Simpson