Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use poly() in R formula to predict

I have a question about the formula and the user defined function:

Case 1:

 clotting <- data.frame(
     u = c(5,10,15,20,30,40,60,80,100),
     lot1 = c(118,58,42,35,27,25,21,19,18),
     lot2 = c(69,35,26,21,18,16,13,12,12))

 g1 = glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
 dc = clotting
 dc$u = 1
 predict(g1, dc)

      1           2           3           4           5           6           7           8           9
 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929

However, if I just simply wrap the poly as a user defined function ( in reality I would have my own more complex function) then I will get error:

Case 2:

 xpoly <- function(x, degree=1){poly(x,degree)}
 g2 = glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
 predict(g2, dc)
       Error in poly(x, degree) :
      'degree' must be less than number of unique points

It seems that the predict treats the user defined function in the formula with I(). My question is how can I get the results for Case2 same as case1?

Anyone can have any idea about this?

like image 789
alex Avatar asked Jul 16 '15 20:07

alex


1 Answers

poly is a bit of a unique function here. By default, it returns a set of orthogonal polynomials so it's doing some centering and rescaling of the data. If you want to be able to predict using the coefficients from the fitted model, you would need to transform new data in the same way that was done with the original data. This means some additional data must be passed along.

First I will point out that if you use the raw, non-orthogonal values, you would not experience this problem.

g1 <- glm(lot1 ~ log(u) + poly(u,1, raw=T), data = clotting, family = Gamma)
xpoly<-function(x,degree=1){poly(x,degree, raw=T)}
g2 <- glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)

dc=clotting
dc$u=1
predict(g1,dc)
#       1           2           3           4           5           6           7           8           9 
#-0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 
predict(g2,dc)
#       1           2           3           4           5           6           7           8           9 
#-0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929

But let's further explore how poly passing along the scaling information to predict. The work actually happens in the model.frame function. Compare these two results

attr(terms(model.frame(lot1 ~ log(u) + poly(u,1), clotting)), "predvar")
# list(lot1, log(u), poly(u, 1, coefs = list(alpha = 40, norm2 = c(1, 
9, 8850))))
attr(terms(model.frame(lot1 ~ log(u) + xpoly(u,1), clotting)), "predvar")
# list(lot1, log(u), xpoly(u, 1))

You can see that the call to poly() in the first formula has been adjusted in the predvar attribute of the formula that's returned. This done in the model.frame code

...
if (is.null(attr(formula, "predvars"))) {
    for (i in seq_along(varnames)) predvars[[i + 1L]] <- makepredictcall(variables[[i]], 
        vars[[i + 1L]])
    attr(formula, "predvars") <- predvars
}
...

Note that it calls the makepredictcall() function which is a generic function which dispatches based on the class of the returned object. It happens that poly returns an object of class "poly"

class(poly(1:5, 1))
# [1] "poly"   "matrix"

So what get's called for the "poly" data is this function

stats:::makepredictcall.poly
function (var, call) 
{
    if (as.character(call)[1L] != "poly") 
        return(call)
    call$coefs <- attr(var, "coefs")
    call
}
<bytecode: 0x123262178>
<environment: namespace:stats>

This is where the coef= attributes are added. But also note that it checks that the call was from the "poly" function itself. Since your function is named "xpoly" but returns a "poly" object, the coefficient information isn't returned. One workaround would be to change the return class of your object and create your own makepredictcall function. For example you could do

xpoly <- function(...){p<-poly(...); class(p)[1]<-"xpoly"; p}
makepredictcall.xpoly <- function(var, call) {
    call$coefs <- attr(var, "coefs")
    call
}

Note that this new version of xpoly will also accept the coef= argument and pass it along to poly() via the ... parameters. Then you can run

g1 <- glm(lot1 ~ log(u) + poly(u,1), data = clotting, family = Gamma)
g2 <- glm(lot1 ~ log(u) + xpoly(u,1), data = clotting, family = Gamma)
predict(g1,dc)
#          1           2           3           4           5           6           7           8           9 
#-0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929
predict(g2,dc)
#          1           2           3           4           5           6           7           8           9 
#-0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929 -0.01398929
like image 84
MrFlick Avatar answered Nov 15 '22 08:11

MrFlick