Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted logistic regression in R

Given sample data of proportions of successes plus sample sizes and independent variable(s), I am attempting logistic regression in R.

The following code does what I want and seems to give sensible results, but does not look like a sensible approach; in effect it doubles the size of the data set

datf <- data.frame(prop  = c(0.125, 0,  0.667, 1,  0.9),
                   cases = c(8,     1,  3,     3,  10),
                   x     = c(11,    12, 15,    16, 18))

datf2         <- rbind(datf,datf)
datf2$success <- rep(c(1, 0), each=nrow(datf))
datf2$cases   <- round(datf2$cases*ifelse(datf2$success,datf2$prop,1-datf2$prop))
fit2          <- glm(success ~ x, weight=cases, data=datf2, family="binomial")

datf$proppredicted    <- 1 / (1 + exp(-predict(fit2, datf)))
plot(datf$x, datf$proppredicted, type="l", col="red", ylim=c(0,1))
points(datf$x, datf$prop, cex=sqrt(datf$cases))

producing a chart like logistic prediction

which looks reasonably sensible.

But I am not happy about the use of datf2 as a way of separating the successes and failures by duplicating the data. Is something like this necessary?

As a lesser question, is there a cleaner way of calculating the predicted proportions?

like image 268
Henry Avatar asked Apr 28 '18 16:04

Henry


1 Answers

No need to construct artificial data like that; glm can fit your model from the dataset as given.

> glm(prop ~ x, family=binomial, data=datf, weights=cases)

Call:  glm(formula = prop ~ x, family = binomial, data = datf, weights = cases)

Coefficients:
(Intercept)            x  
    -9.3533       0.6714  

Degrees of Freedom: 4 Total (i.e. Null);  3 Residual
Null Deviance:      17.3 
Residual Deviance: 2.043    AIC: 11.43

You will get a warning about "non-integer #successes", but that is because glm is being silly. Compare to the model on your constructed dataset:

> fit2

Call:  glm(formula = success ~ x, family = "binomial", data = datf2, 
    weights = cases)

Coefficients:
(Intercept)            x  
    -9.3532       0.6713  

Degrees of Freedom: 7 Total (i.e. Null);  6 Residual
Null Deviance:      33.65 
Residual Deviance: 18.39    AIC: 22.39

The regression coefficients (and therefore predicted values) are basically equal. However your residual deviance and AIC are suspect because you've created artificial data points.

like image 169
Hong Ooi Avatar answered Oct 12 '22 13:10

Hong Ooi