I am trying to use a Newton-Raphson algorithm in R
to minimize a log-likelihood function that I wrote for a very specific problem. I will say honestly that estimation methods are above my head, but I know that many people in my field (psychometrics) use NR algorithms for estimation, so I am trying to use this method, at least to begin with. I have a series of nested functions that return a scalar as the log-likelihood estimate for a particular vector of data:
log.likelihoodSL <- function(x,sxdat1,item) {
theta <- x[1]
rho <- x[2]
log.lik <- 0
for (it in 1:length(sxdat1)) {
val <- as.numeric(sxdat1[it])
apars <- item[it,1:3]
cpars <- item[it,4:6]
log.lik <- log.lik + as.numeric(log.pSL(theta,rho,apars,cpars,val))
}
return(log.lik)
}
log.pSL <- function(theta,rho,apars,cpars,val) {
p <- (rho * e.aSL(theta,apars,cpars,val)) + ((1-rho) * e.nrm(theta,apars,cpars,val))
log.p <- log(p)
return(log.p)
}
e.aSL <- function(theta,apars,cpars,val) {
if (val==1) {
aprob <- e.nrm(theta,apars,cpars,val)
} else if (val==2) {
aprob <- 1 - e.nrm(theta,apars,cpars,val)
} else
aprob <- 0
return(aprob)
}
e.nrm <- function(theta,apars,cpars,val) {
nprob <- exp(apars*theta + cpars)/sum(exp((apars*theta) + cpars))
nprob <- nprob[val]
return(nprob)
}
Those functions all call each other in turn, in the order presented. The call to the highest function is as follows:
max1 <- maxNR(log.likelihoodSL,grad=NULL,hess=NULL,start=x,print.level=1,sxdat1=sxdat1,item=item)
Here is a sample of the input data (which I call sxdat1
in this case):
> sxdat1
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18
2 1 3 1 3 3 2 2 3 2 2 2 2 2 3 2 3 2
V19 V20
2 2
And here is the variable item
:
> item
V1 V2 V3 V4 V5 V6
[1,] 0.2494625 0.3785529 -0.6280155 -0.096817808 -0.7549263 0.8517441
[2,] 0.2023690 0.4582290 -0.6605980 -0.191895013 -0.8391203 1.0310153
[3,] 0.2044005 0.3019147 -0.5063152 -0.073135691 -0.6061725 0.6793082
[4,] 0.2233619 0.4371988 -0.6605607 -0.160377714 -0.8233197 0.9836974
[5,] 0.2257933 0.2851198 -0.5109131 -0.044494872 -0.5970246 0.6415195
[6,] 0.2047308 0.3438725 -0.5486033 -0.104356236 -0.6693569 0.7737131
[7,] 0.3402220 0.2724951 -0.6127172 0.050795183 -0.6639092 0.6131140
[8,] 0.2513672 0.3263046 -0.5776718 -0.056203015 -0.6779823 0.7341853
[9,] 0.2008285 0.3389165 -0.5397450 -0.103565987 -0.6589961 0.7625621
[10,] 0.2890680 0.2700661 -0.5591341 0.014251386 -0.6219001 0.6076488
[11,] 0.3127214 0.2572715 -0.5699929 0.041587479 -0.6204483 0.5788608
[12,] 0.2697048 0.2965255 -0.5662303 -0.020115553 -0.6470669 0.6671825
[13,] 0.2799978 0.3219374 -0.6019352 -0.031454750 -0.6929045 0.7243592
[14,] 0.2773233 0.2822723 -0.5595956 -0.003711768 -0.6314010 0.6351127
[15,] 0.2433519 0.2632824 -0.5066342 -0.014947878 -0.5774375 0.5923853
[16,] 0.2947281 0.3605812 -0.6553092 -0.049389825 -0.7619178 0.8113076
[17,] 0.2290081 0.3114185 -0.5404266 -0.061807853 -0.6388839 0.7006917
[18,] 0.3824588 0.2543871 -0.6368459 0.096053788 -0.6684247 0.5723709
[19,] 0.2405821 0.3903595 -0.6309416 -0.112333048 -0.7659758 0.8783089
[20,] 0.2424331 0.3028480 -0.5452811 -0.045311136 -0.6360968 0.6814080
The two parameters over which I want to minimize the function log.likelihood()
are theta and rho, and I want to constrain theta to be between -3 and 3, and rho to be between 0 and 1, but I don't know how to do this with the current set-up. Can anybody help me out? Do I need to use a different estimation method from the Newton-Raphson method or is there a way to implement this using the function maxNR
, which is from the package maxLik
, that I am currently using? Thanks!
Edit: the vector x
, which contains the start values for the parameters theta and rho, is just c(0,0)
because that is the "average" or "default" assumption for these parameters (in terms of their substantive interpretation).
The data in a more convenient form:
sxdat1 <- c(2,1,3,1,3,3,2,2,3,2,2,2,2,2,3,2,3,2,2,2)
item <- matrix(c(
0.2494625,0.3785529,-0.6280155,-0.096817808,-0.7549263,0.8517441,
0.2023690,0.4582290,-0.6605980,-0.191895013,-0.8391203,1.0310153,
0.2044005,0.3019147,-0.5063152,-0.073135691,-0.6061725,0.6793082,
0.2233619,0.4371988,-0.6605607,-0.160377714,-0.8233197,0.9836974,
0.2257933,0.2851198,-0.5109131,-0.044494872,-0.5970246,0.6415195,
0.2047308,0.3438725,-0.5486033,-0.104356236,-0.6693569,0.7737131,
0.3402220,0.2724951,-0.6127172,0.050795183,-0.6639092,0.6131140,
0.2513672,0.3263046,-0.5776718,-0.056203015,-0.6779823,0.7341853,
0.2008285,0.3389165,-0.5397450,-0.103565987,-0.6589961,0.7625621,
0.2890680,0.2700661,-0.5591341,0.014251386,-0.6219001,0.6076488,
0.3127214,0.2572715,-0.5699929,0.041587479,-0.6204483,0.5788608,
0.2697048,0.2965255,-0.5662303,-0.020115553,-0.6470669,0.6671825,
0.2799978,0.3219374,-0.6019352,-0.031454750,-0.6929045,0.7243592,
0.2773233,0.2822723,-0.5595956,-0.003711768,-0.6314010,0.6351127,
0.2433519,0.2632824,-0.5066342,-0.014947878,-0.5774375,0.5923853,
0.2947281,0.3605812,-0.6553092,-0.049389825,-0.7619178,0.8113076,
0.2290081,0.3114185,-0.5404266,-0.061807853,-0.6388839,0.7006917,
0.3824588,0.2543871,-0.6368459,0.096053788,-0.6684247,0.5723709,
0.2405821,0.3903595,-0.6309416,-0.112333048,-0.7659758,0.8783089,
0.2424331,0.3028480,-0.5452811,-0.045311136,-0.6360968,0.6814080),
byrow=TRUE,ncol=6)
Using maxNR
:
library(maxLik)
x <- c(0,0)
max1 <- maxNR(log.likelihoodSL,grad=NULL,hess=NULL,start=x,
print.level=1,sxdat1=sxdat1,item=item)
Note warnings incurred when rho
wanders negative. However, maxNR
can recover
from this and gets an estimate (theta=-1, rho=0.63) that is in the interior of
the feasible set. L-BFGS-B
can't handle non-finite interim results, but the bounds
keep it the algorithm away from those problematic regions.
I chose to do this with bbmle
rather than in optim
: bbmle
is a wrapper for optim
(and other optimization tools) that offers some nice features specific to likelihood estimation (profiling, confidence intervals, likelihood ratio tests between models, etc.).
library(bbmle)
## mle2() wants a NEGATIVE log-likelihood
NLL <- function(x,sxdat1,item) {
-log.likelihoodSL(x,sxdat1,item)
}
edit: in an earlier version I used control=list(fnscale=-1)
to tell the optimizer that I was passing a log-likelihood function that should be maximized rather than minimized; this gets to the right answer, but subsequent attempts to use the results might get very confusing because the package isn't accounting for this possiblity (e.g. the sign of the reported log-likelihood is wrong). This could be fixed in the package, but I'm not sure it's worth it.
## needed when objective function takes a vector of args rather than
## separate named arguments:
parnames(NLL) <- c("theta","rho")
(m1 <- mle2(NLL,start=c(theta=0,rho=0.5),method="L-BFGS-B",
lower=c(theta=-3,rho=2e-3),upper=c(theta=3,rho=1-2e-3),
data=list(sxdat1=sxdat1,item=item)))
A couple of points here:
rho=0.5
rather than on the boundary rho
boundaries slightly inside [0,1] (L-BFGS-B
doesn't always perfectly respect boundaries when computing finite difference approximations of derivatives)data
argumentIn this case I get the same results as maxNR
.
## Call:
## mle2(minuslogl = NLL, start = c(theta = 0, rho = 0.5),
## method = "L-BFGS-B", data = list(sxdat1 = sxdat1, item = item),
## lower = c(theta = -3, rho = 0.002), upper = c(theta = 3,
## rho = 1 - 0.002), control = list(fnscale = -1))
##
## Coefficients:
## theta rho
## -1.0038531 0.6352782
##
## Log-likelihood: -18.11
Unless you have a really burning need to do this with Newton-Raphson rather than with a gradient-based "quasi-Newton" method, I would guess that this is good enough. (It doesn't sound like you have strong technical reasons to do so, other than "that's what other people do in my field" -- a good reason, all other things being equal, but not enough in this case to make me dig around to implement N-R when similar methods are easily available and work fine.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With