Constrained Newton-Raphson estimation

Question

I am trying to use a Newton-Raphson algorithm in R to minimize a log-likelihood function that I wrote for a very specific problem. I will say honestly that estimation methods are above my head, but I know that many people in my field (psychometrics) use NR algorithms for estimation, so I am trying to use this method, at least to begin with. I have a series of nested functions that return a scalar as the log-likelihood estimate for a particular vector of data:

log.likelihoodSL <- function(x,sxdat1,item) {
  theta <- x[1]
  rho <- x[2]
  log.lik <- 0
  for (it in 1:length(sxdat1)) {
    val <- as.numeric(sxdat1[it])
    apars <- item[it,1:3]
    cpars <- item[it,4:6]
    log.lik <- log.lik + as.numeric(log.pSL(theta,rho,apars,cpars,val))
  }
  return(log.lik)
}

log.pSL <- function(theta,rho,apars,cpars,val) {
  p <- (rho * e.aSL(theta,apars,cpars,val)) + ((1-rho) * e.nrm(theta,apars,cpars,val))
  log.p <- log(p)
  return(log.p)
}

e.aSL <- function(theta,apars,cpars,val) {
  if (val==1) {
    aprob <- e.nrm(theta,apars,cpars,val)
  } else if (val==2) {
    aprob <- 1 - e.nrm(theta,apars,cpars,val)
  } else
    aprob <- 0
  return(aprob)
}

e.nrm <- function(theta,apars,cpars,val) {
  nprob <- exp(apars*theta + cpars)/sum(exp((apars*theta) + cpars))
  nprob <- nprob[val]
  return(nprob)
}

Those functions all call each other in turn, in the order presented. The call to the highest function is as follows:

max1 <- maxNR(log.likelihoodSL,grad=NULL,hess=NULL,start=x,print.level=1,sxdat1=sxdat1,item=item)

Here is a sample of the input data (which I call sxdat1 in this case):

> sxdat1
 V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 
  2   1   3   1   3   3   2   2   3   2   2   2   2   2   3   2   3   2 
V19 V20 
  2   2

And here is the variable item:

> item
             V1        V2         V3           V4         V5        V6
 [1,] 0.2494625 0.3785529 -0.6280155 -0.096817808 -0.7549263 0.8517441
 [2,] 0.2023690 0.4582290 -0.6605980 -0.191895013 -0.8391203 1.0310153
 [3,] 0.2044005 0.3019147 -0.5063152 -0.073135691 -0.6061725 0.6793082
 [4,] 0.2233619 0.4371988 -0.6605607 -0.160377714 -0.8233197 0.9836974
 [5,] 0.2257933 0.2851198 -0.5109131 -0.044494872 -0.5970246 0.6415195
 [6,] 0.2047308 0.3438725 -0.5486033 -0.104356236 -0.6693569 0.7737131
 [7,] 0.3402220 0.2724951 -0.6127172  0.050795183 -0.6639092 0.6131140
 [8,] 0.2513672 0.3263046 -0.5776718 -0.056203015 -0.6779823 0.7341853
 [9,] 0.2008285 0.3389165 -0.5397450 -0.103565987 -0.6589961 0.7625621
[10,] 0.2890680 0.2700661 -0.5591341  0.014251386 -0.6219001 0.6076488
[11,] 0.3127214 0.2572715 -0.5699929  0.041587479 -0.6204483 0.5788608
[12,] 0.2697048 0.2965255 -0.5662303 -0.020115553 -0.6470669 0.6671825
[13,] 0.2799978 0.3219374 -0.6019352 -0.031454750 -0.6929045 0.7243592
[14,] 0.2773233 0.2822723 -0.5595956 -0.003711768 -0.6314010 0.6351127
[15,] 0.2433519 0.2632824 -0.5066342 -0.014947878 -0.5774375 0.5923853
[16,] 0.2947281 0.3605812 -0.6553092 -0.049389825 -0.7619178 0.8113076
[17,] 0.2290081 0.3114185 -0.5404266 -0.061807853 -0.6388839 0.7006917
[18,] 0.3824588 0.2543871 -0.6368459  0.096053788 -0.6684247 0.5723709
[19,] 0.2405821 0.3903595 -0.6309416 -0.112333048 -0.7659758 0.8783089
[20,] 0.2424331 0.3028480 -0.5452811 -0.045311136 -0.6360968 0.6814080

The two parameters over which I want to minimize the function log.likelihood() are theta and rho, and I want to constrain theta to be between -3 and 3, and rho to be between 0 and 1, but I don't know how to do this with the current set-up. Can anybody help me out? Do I need to use a different estimation method from the Newton-Raphson method or is there a way to implement this using the function maxNR, which is from the package maxLik, that I am currently using? Thanks!

Edit: the vector x, which contains the start values for the parameters theta and rho, is just c(0,0) because that is the "average" or "default" assumption for these parameters (in terms of their substantive interpretation).

Ben Bolker · Accepted Answer

The data in a more convenient form:

sxdat1 <- c(2,1,3,1,3,3,2,2,3,2,2,2,2,2,3,2,3,2,2,2)
item <- matrix(c(
0.2494625,0.3785529,-0.6280155,-0.096817808,-0.7549263,0.8517441,
0.2023690,0.4582290,-0.6605980,-0.191895013,-0.8391203,1.0310153,
0.2044005,0.3019147,-0.5063152,-0.073135691,-0.6061725,0.6793082,
0.2233619,0.4371988,-0.6605607,-0.160377714,-0.8233197,0.9836974,
0.2257933,0.2851198,-0.5109131,-0.044494872,-0.5970246,0.6415195,
0.2047308,0.3438725,-0.5486033,-0.104356236,-0.6693569,0.7737131,
0.3402220,0.2724951,-0.6127172,0.050795183,-0.6639092,0.6131140,
0.2513672,0.3263046,-0.5776718,-0.056203015,-0.6779823,0.7341853,
0.2008285,0.3389165,-0.5397450,-0.103565987,-0.6589961,0.7625621,
0.2890680,0.2700661,-0.5591341,0.014251386,-0.6219001,0.6076488,
0.3127214,0.2572715,-0.5699929,0.041587479,-0.6204483,0.5788608,
0.2697048,0.2965255,-0.5662303,-0.020115553,-0.6470669,0.6671825,
0.2799978,0.3219374,-0.6019352,-0.031454750,-0.6929045,0.7243592,
0.2773233,0.2822723,-0.5595956,-0.003711768,-0.6314010,0.6351127,
0.2433519,0.2632824,-0.5066342,-0.014947878,-0.5774375,0.5923853,
0.2947281,0.3605812,-0.6553092,-0.049389825,-0.7619178,0.8113076,
0.2290081,0.3114185,-0.5404266,-0.061807853,-0.6388839,0.7006917,
0.3824588,0.2543871,-0.6368459,0.096053788,-0.6684247,0.5723709,
0.2405821,0.3903595,-0.6309416,-0.112333048,-0.7659758,0.8783089,
0.2424331,0.3028480,-0.5452811,-0.045311136,-0.6360968,0.6814080),
               byrow=TRUE,ncol=6)

Using maxNR:

library(maxLik)
x <- c(0,0)
max1 <- maxNR(log.likelihoodSL,grad=NULL,hess=NULL,start=x,
              print.level=1,sxdat1=sxdat1,item=item)

Note warnings incurred when rho wanders negative. However, maxNR can recover from this and gets an estimate (theta=-1, rho=0.63) that is in the interior of the feasible set. L-BFGS-B can't handle non-finite interim results, but the bounds keep it the algorithm away from those problematic regions.

I chose to do this with bbmle rather than in optim: bbmle is a wrapper for optim (and other optimization tools) that offers some nice features specific to likelihood estimation (profiling, confidence intervals, likelihood ratio tests between models, etc.).

library(bbmle)

## mle2() wants a NEGATIVE log-likelihood
NLL <- function(x,sxdat1,item) {
    -log.likelihoodSL(x,sxdat1,item)
}

edit: in an earlier version I used control=list(fnscale=-1) to tell the optimizer that I was passing a log-likelihood function that should be maximized rather than minimized; this gets to the right answer, but subsequent attempts to use the results might get very confusing because the package isn't accounting for this possiblity (e.g. the sign of the reported log-likelihood is wrong). This could be fixed in the package, but I'm not sure it's worth it.

## needed when objective function takes a vector of args rather than
##  separate named arguments:
parnames(NLL) <- c("theta","rho")
(m1 <- mle2(NLL,start=c(theta=0,rho=0.5),method="L-BFGS-B",
     lower=c(theta=-3,rho=2e-3),upper=c(theta=3,rho=1-2e-3),
     data=list(sxdat1=sxdat1,item=item)))

A couple of points here:

started at rho=0.5 rather than on the boundary
set the rho boundaries slightly inside [0,1] (L-BFGS-B doesn't always perfectly respect boundaries when computing finite difference approximations of derivatives)
specified the input data in the data argument

In this case I get the same results as maxNR.

 ## Call:
 ## mle2(minuslogl = NLL, start = c(theta = 0, rho = 0.5), 
 ##     method = "L-BFGS-B", data = list(sxdat1 = sxdat1, item = item), 
 ##     lower = c(theta = -3, rho = 0.002), upper = c(theta = 3, 
 ##         rho = 1 - 0.002), control = list(fnscale = -1))
 ## 
 ## Coefficients:
 ##      theta        rho 
 ## -1.0038531  0.6352782 
 ## 
 ## Log-likelihood: -18.11

Unless you have a really burning need to do this with Newton-Raphson rather than with a gradient-based "quasi-Newton" method, I would guess that this is good enough. (It doesn't sound like you have strong technical reasons to do so, other than "that's what other people do in my field" -- a good reason, all other things being equal, but not enough in this case to make me dig around to implement N-R when similar methods are easily available and work fine.)

Constrained Newton-Raphson estimation

Tags:

r

Jon

1 Answers

Ben Bolker

Recent Activity

Donate For Us

Constrained Newton-Raphson estimation

Tags:

r

Jon

1 Answers

Ben Bolker

Related questions

Recent Activity

Donate For Us