Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logistic Regression Using R

I am running logistic regressions using R right now, but I cannot seem to get many useful model fit statistics. I am looking for metrics similar to SAS:

http://www.ats.ucla.edu/stat/sas/output/sas_logit_output.htm

Does anyone know how (or what packages) I can use to extract these stats?

Thanks

like image 474
josephmisiti Avatar asked Nov 25 '25 09:11

josephmisiti


2 Answers

Here's a Poisson regression example:

## from ?glm:
d.AD <- data.frame(counts=c(18,17,15,20,10,20,25,13,12),
      outcome=gl(3,1,9),
      treatment=gl(3,3))
glm.D93 <- glm(counts ~ outcome + treatment,data = d.AD, family=poisson())

Now define a function to fit an intercept-only model with the same response, family, etc., compute summary statistics, and combine them into a table (matrix). The formula .~1 in the update command below means "refit the model with the same response variable [denoted by the dot on the LHS of the tilde] but with only an intercept term [denoted by the 1 on the RHS of the tilde]"

glmsumfun <- function(model) {
   glm0 <- update(model,.~1)  ## refit with intercept only
   ## apply built-in logLik (log-likelihood), AIC,
   ##  BIC (Bayesian/Schwarz Information Criterion) functions
   ## to models with and without intercept ('model' and 'glm0');
   ## combine the results in a two-column matrix with appropriate
   ## row and column names
   matrix(c(logLik(glm.D93),BIC(glm.D93),AIC(glm.D93),
           logLik(glm0),BIC(glm0),AIC(glm0)),ncol=2,
     dimnames=list(c("logLik","SC","AIC"),c("full","intercept_only")))
}

Now apply the function:

glmsumfun(glm.D93)

The results:

            full intercept_only
logLik -23.38066      -26.10681
SC      57.74744       54.41085
AIC     56.76132       54.21362

EDIT:

  • anova(glm.D93,test="Chisq") gives a sequential analysis of deviance table containing df, deviance (=-2 log likelihood), residual df, residual deviance, and the likelihood ratio test (chi-squared test) p-value.
  • drop1(glm.D93) gives a table with the AIC values (df, deviances, etc.) for each single-term deletion; drop1(glm.D93,test="Chisq") additionally gives the LRT test p value.
like image 62
Ben Bolker Avatar answered Nov 26 '25 23:11

Ben Bolker


Certainly glm with a family="binomial" argument is the function most commonly used for logistic regression. The default handling of contrasts of factors is different. R uses treatment contrasts and SAS (I think) uses sum contrasts. You can look these technical issues up on R-help. They have been discussed many, many times over the last ten+ years.

I see Greg Snow mentioned lrm in 'rms'. It has the advantage of being supported by several other functions in the 'rms' suite of methods. I would use it , too, but learning the rms package may take some additional time. I didn't see an option that would create SAS-like output.

If you want to compare the packages on similar problems that UCLA StatComputing pages have another resource: http://www.ats.ucla.edu/stat/r/dae/default.htm , where a large number of methods are exemplified in SPSS, SAS, Stata and R.

like image 37
IRTFM Avatar answered Nov 26 '25 21:11

IRTFM



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!