I am doing ologit with different packages, they are VGAM
, rms
, MASS
and ordinal
, using data set wine
from package ordinal
.
First is vglm()
:
library(VGAM)
vglmfit <- vglm(rating ~ temp * contact, data = wine,
family=cumulative(parallel=TRUE, reverse=TRUE))
The coefficients are:
(Intercept):1 (Intercept):2 (Intercept):3 (Intercept):4
1.4112568 -1.1435551 -3.3770742 -4.9419773
tempwarm contactyes tempwarm:contactyes
2.3212033 1.3474598 0.3595241
Second is orm()
:
library(rms)
ormfit <- orm(rating ~ temp * contact, data = wine)
Coef:
Coef S.E. Wald Z Pr(>|Z|)
y>=2 1.4113 0.5454 2.59 0.0097
y>=3 -1.1436 0.5097 -2.24 0.0248
y>=4 -3.3771 0.6382 -5.29 <0.0001
y>=5 -4.9420 0.7509 -6.58 <0.0001
temp=warm 2.3212 0.7009 3.31 0.0009
contact=yes 1.3475 0.6604 2.04 0.0413
temp=warm * contact=yes 0.3595 0.9238 0.39 0.6971
Third, polr
:
library(MASS)
polrfit <- polr(rating ~ temp * contact, method="logistic", data = wine)
coef:
Coefficients:
tempwarm contactyes tempwarm:contactyes
2.3211214 1.3474055 0.3596357
Intercepts:
1|2 2|3 3|4 4|5
-1.411278 1.143507 3.377005 4.941901
Last, clm()
:
library(ordinal)
clmfit <- clm(rating ~ temp * contact, link="logit", data = wine)
coef:
Coefficients:
tempwarm contactyes tempwarm:contactyes
2.3212 1.3475 0.3595
Threshold coefficients:
1|2 2|3 3|4 4|5
-1.411 1.144 3.377 4.942
Besides, when reverse=FALSE
in vglm(),
library(VGAM)
vglmfit <- vglm(rating ~ temp * contact, data = wine,
family=cumulative(parallel=TRUE, reverse=FALSE))
Coefficients:
(Intercept):1 (Intercept):2 (Intercept):3 (Intercept):4
-1.4112568 1.1435551 3.3770742 4.9419773
tempwarm contactyes tempwarm:contactyes
-2.3212033 -1.3474598 -0.3595241
You may notice that the coefficients in vglm()
while reverse=TRUE
and those in orm()
are the same, and the ones in polr()
and clm()
are the same. So there are two set of coefficients, the only difference is the sign of intercepts.
And while I set reverse=FALSE
, it does reverse the intercepts, but at the same time the parameters of variables, which I don't want.
What's the problem of that? How could I get exactly the same result? or how could I explain it?
This is all just a matter of parametrizations. One classical way to introduce the ordered logistic regression model is to assume that there is a latent continuous response
y* = x'b + e
where e has a standard logistic distribution. Then, it is assumed that not y* itself is observed by only a discretized category y = j if y* falls between cut-offs a_j-1 and a_j. This then leads to the model equation:
logit(P(y <= j)) = a_j - x'b
Other motivations lead to similar equations but with P(y >= j) and/or a_j + x'b. This just leads to switches in the signs of the a and/or b coefficients that you observe in the different implementations. The corresponding models and predictions are equivalent, of course. Which interpretation you find easier is mostly a matter of taste.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With