Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predict function in Clogit (Conditional logit)

I am trying to understand how the predict function works for conditional logit. I tried to calculate it manually but it does not work. Any help would be greatly appreciated.

library(survival)

set.seed(1)
data <- data.frame(Used = rep(c(1,0,0,0),1250),
                   Open = round(runif(5000,0,50),0),
                   Strata = rep(1:1250,each=4))

mod <- clogit(Used ~ Open + strata(Strata),data=data)

newdata <- data.frame(Open = seq(0,10,1),Strata=1)

risk <- predict(mod,newdata=newdata,type = "risk")
risk


lp <- predict(mod,newdata=newdata,type = "lp")
lp


##manual calculation
coef<-data.frame(coef = summary(mod)$coefficients[,1],
                 se= summary(mod)$coefficients[,3])
coef$se <-summary(mod)$coefficients[,3]
coef$UpCI <- coef[,1] + (coef[,2]*2) ### this could be *1.96 but using 2 for simplicity
coef$LowCI <-coef[,1] - (coef[,2]*2) ### this could be *1.96 but using 2 for simplicity

fitted<-data.frame(Open= seq(0,10,1))

fitted$Marginal <- exp(coef[1,1]*fitted$Open) /
  (1+exp(coef[1,1]*fitted$Open ))

fitted$UpCI <- exp(coef[1,3]*fitted$Open )/
  (1+exp(coef[1,3]*fitted$Open))

fitted$LowCI <- exp(coef[1,4]*fitted$Open )/
  (1+exp(coef[1,4]*fitted$Open ))
like image 954
Pam G Avatar asked Aug 31 '25 04:08

Pam G


1 Answers

This turns out to be a rather deep rabbit hole. There is an answer here, but I'm not quite sure it's correct.

The basic statistical issue is described below. You will want either risk/(1+risk) or risk/[average risk per stratum]; I think these might be equivalent if there are exactly two observations per stratum ...

It may be useful to know that the risk prediction is exp(lp), and risk/(risk+1) is plogis(lp), so if you have standard errors you should be able to get confidence intervals via

pred <- predict(model, newdata, type = "lp", se.fit = TRUE)
with(pred, plogis(fit ± 1.96*se.fit))  ## abbreviated, ofc ± doesn't really work ...

(For more information on how the predict() function works, see e.g. https://stats.stackexchange.com/questions/281781/predict-functions-after-clogit-in-r-using-survival-package)

Getting confidence intervals on the risk/average_risk form seems harder.

If you have more questions about this, you might need to follow up on Cross Validated (but please include a link to this Q&A so as not to waste time re-treading old ground) ...


Searching (after cloning) the r-help directory of Michael Chirico's R mailing lists archive: specifically, you can search online here within any of the files described below.

I ran egrep -l "(predict.*clogit|clogit.*predict)" * on the command line ("search all files for text containing 'predict [... stuff ...] clogit' or 'clogit [... stuff ...] predict'")

2006-February.txt

Arne Jol:

I wonder if there is a prediction function for a clogit model which can be used in the same way as the predict function for the multinom model.

Thomas Lumley:

I don't think this is going to be possible. The point of conditional logistic regression is that the probabilities depend on stratum parameters that cannot be estimated accurately. The conditional likelihood removes these parameters, but the resulting model does not contain enough information to estimate probabilities.

2010-April.txt

Discussion between Thomas Lumley and Chuck Berry suggesting

lp <- predict(model, type = "lp", newdata = ...)
exp(lp)/ave( exp(lp), stratvar, FUN=sum )

2014-June.txt

Peter Dalgaard:

I'm rusty on this, but I think what you want is something like

m <- model.matrix(~ x1 + x2 - 1, data=dat.test)
pp <- exp(m %*% coef(fit))
pps <- ave(pp, dat.test$set, FUN=sum)
pp/pps

[this is the same as the suggestion from April 2010]

i.e. the conditional probability that an observation is a case given covariates and that there is on[e] case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in.

Terry Therneau:

As Peter D said, the clogit function simply sets up a special data set and then calls coxph, and is based on an identity that the likelihood for the conditional logistic is identical to the likelihood of a Cox model for a specially structured data set. I vacillate on whether this identity is just a mathematical accident or the mark of some deep truth. However it did allow us to use existing code rather than write new routine from scratch.

In this special data set all of the follow-up times are set to 1 (any constant value would do). For normal Cox regression a predicted "expected number of events" depends on the length of follow-up for the subject, so the predict routine expects to find a follow-up time variable in your newdata. In my code, "rep(1, 150L)" is being mistaken for the name of the time variable, and failure of the confused routine ensues in due course.

For the interim: in a conditional logistic the "expected number of events" is just exp(eta)/(1 + exp(eta)) where eta is the linear predictor. There is no follow-up time to worry about and predict.coxph would have done way too much work, even if it hadn't gotton [sic] confused. Use

risk <- predict(fit, newdata=dat.test, type='risk')
risk / (1+risk)
like image 101
Ben Bolker Avatar answered Sep 02 '25 18:09

Ben Bolker