I have a dataset containing an independent variable and a set of dependent variables. I'd like to fit a function to each set of independent variables, using a bootstrapped nonlinear least squares procedure. In some cases, the independent variables are 'good quality', i.e. fit the function reasonably well. In other cases, they're noisy.
In all cases, I can use nls()
to get an estimate of the parameters. However, when the data are noisy, a bootstrap throws the error Error in nls(...) : singular gradient
. I can understand why nls
fitting to noisy data would fail, e.g. by failing to converge after too many iterations, but I don't understand why it is a singular gradient error, and why I only get it resampled datasets of poor quality.
Code:
require(ggplot2)
require(plyr)
require(boot)
# Data are in long form: columns are 'enzyme', 'x', and 'y'
enz <- read.table("http://dl.dropbox.com/s/ts3ruh91kpr47sj/SE.txt", header=TRUE)
# Nonlinear formula to fit to data
mmFormula <- formula(y ~ (x*Vmax) / (x + Km))
nls
is perfectly capable of fitting the data (even if in some cases, like a
, I doubt the model fits the data.
# Use nls to fit mmFormula to the data - this works well enough
fitDf <- ddply(enz, .(enzyme), function(x) coefficients(nls(mmFormula, x, start=list(Km=100, Vmax=0.5))))
# Create points to plot for the simulated fits
xGrid <- 0:200
simFits <- dlply(fitDf, .(enzyme), function(x) data.frame(x=xGrid, y=(xGrid * x$Vmax)/(xGrid + x$Km)))
simFits <- ldply(simFits, identity)
ggplot() + geom_point(data=enz, aes(x=x, y=y)) + geom_line(data=simFits, aes(x=x, y=y)) +
facet_wrap(~enzyme, scales="free_y") + aes(ymin=0)
Bootstrapping works fine for the good quality data:
# Function to pass to bootstrap; returns coefficients of nls fit to formula
nlsCoef <- function(df, i) {
KmGuess <- median(df$x)
VmaxGuess <- max(df$y)
dfSamp <- df[i,]
nlsCoef <- coefficients(nls(mmFormula, dfSamp, start=list(Km=100, Vmax=0.5)))
}
eBoot <- boot(subset(enz, enzyme=="e"), nlsCoef, R=1000) #No error
But not for the poor quality data
dBoot <- boot(subset(enz, enzyme=="d"), nlsCoef, R=10)
> Error in nls(mmFormula, dfSamp, start = list(Km = KmGuess, Vmax = VmaxGuess)) :
singular gradient
What is causing this error? And what should I do about it, given that I'd like to use plyr
to simultaneously perform lots of bootstrap simulations?
This allows you to examine what happens:
#modified function
#returns NAs if fit is not sucessfull
#does global assignment to store bootstrap permutations
nlsCoef <- function(df, i) {
KmGuess <- median(df$x)
VmaxGuess <- max(df$y)
dfSamp <- df[i,]
fit <- NULL
try(fit <- nls(mmFormula, dfSamp, start=list(Km=100, Vmax=0.5)))
if(!is.null(fit)){
res <- coef(fit)
} else{
res <- c(Km=NA,Vmax=NA)
}
istore[k,] <<- i
k <<- k+1
res
}
n <- 100
istore <- matrix(nrow=n+1,ncol=9)
k <- 1
dat <- subset(enz, enzyme=="d")
dBoot <- boot(dat, nlsCoef, R=n)
#permutations that create samples that cannot be fitted
nais <- istore[-1,][is.na(dBoot$t[,1]),]
#look at first bootstrap sample
#that could not be fitted
samp <- dat[nais[1,],]
plot(y~x,data=samp)
fit <- nls(mmFormula, samp, start=list(Km=100, Vmax=0.5))
#error
You can also use a self-starting model:
try(fit <- nls(y ~ SSmicmen(x, Vmax, Km), data = dfSamp))
With that error messages become a bit more informative. E.g., one error is
too few distinct input values to fit a Michaelis-Menten model
which means, that some bootstrap samples contain less then three distinct concentrations. But there are also some other errors:
step factor 0.000488281 reduced below 'minFactor' of 0.000976562
which you might be able to avoid by decreasing minFactor
.
The following are nasty. You could try different fitting algorithms or starting values with those:
singular gradient matrix at initial parameter estimates
singular gradient
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With