The error: Error in predmat[which, seq(nlami)] = preds : replacement has length zero
The context: data is simulated with a binary y, but there are n
coders of true y
. the data is stacked n
times and a model is fitted, trying to get true y
.
The error is received for
L2
penalty, but not L1
penalty.UPDATE: the error is for versions after 1.9-8. 1.9-8 does not fail.
library(glmnet)
rm(list=ls())
set.seed(123)
num_obs=4000
n_coders=2
precision=.8
X <- matrix(rnorm(num_obs*20, sd=1), nrow=num_obs)
prob1 <- plogis(X %*% c(2, -2, 1, -1, rep(0, 16))) # yes many zeros, ignore
y_true <- rbinom(num_obs, 1, prob1)
dat <- data.frame(y_true = y_true, X = X)
classify <- function(true_y,precision){
n=length(true_y)
y_coder <- numeric(n)
y_coder[which(true_y==1)] <- rbinom(n=length(which(true_y==1)),
size=1,prob=precision)
y_coder[which(true_y==0)] <- rbinom(n=length(which(true_y==0)),
size=1,prob=(1-precision))
return(y_coder)
}
y_codings <- sapply(rep(precision,n_coders),classify,true_y = dat$y_true)
expanded_data <- do.call(rbind,rep(list(dat),n_coders))
expanded_data$y_codings <- matrix(y_codings, ncol = 1)
Since the error depends on seed, a loop is necessary. only the first loop will fail, the other two will finish.
X <- as.matrix(expanded_data[,grep("X",names(expanded_data))])
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings,
family="binomial", alpha=0) # will fail
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings,
family="binomial", alpha=1) # will not fail
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_true,
family="binomial", alpha=0) # will not fail
Any thoughts where this is coming from in glmnet and how to avoid it? from my reading of cv.glmnet
, this is after the cv routine and is inside cvstuff = do.call(fun, list(outlist, lambda, x, y, weights, offset, foldid, type.measure, grouped, keep))
, which I do not understand its role, hence the failure, and how to avoid it.
sessions (Ubuntu and PC)
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnet_2.0-2 foreach_1.4.3 Matrix_1.2-7.1 devtools_1.12.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.2.0 tools_3.3.1 withr_1.0.2 curl_2.1
[6] memoise_1.0.0 codetools_0.2-15 grid_3.3.1 iterators_1.0.8 knitr_1.14
[11] digest_0.6.10 lattice_0.20-34
and
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnet_2.0-2 foreach_1.4.3 Matrix_1.2-7.1 devtools_1.12.0
loaded via a namespace (and not attached):
[1] httr_1.2.1 R6_2.2.0 tools_3.3.1 withr_1.0.2 curl_2.1
[6] memoise_1.0.0 codetools_0.2-15 grid_3.3.1 iterators_1.0.8 digest_0.6.10
[11] lattice_0.20-34
Ridge regression with glmnet. The glmnet package provides the functionality for ridge regression via glmnet(). Important things to know: Rather than accepting a formula and data frame, it requires a vector input and matrix of predictors. You must specify alpha = 0 for ridge regression. Ridge regression involves tuning a hyperparameter, lambda.
Bear in mind that this is not exactly the same, because this is stopping at a lasso knot (when a variable enters) instead of at any point. Please note that glmnet is the preferred package now, it is actively maintained, more so than lars, and that there have been questions about glmnet vs lars answered before (algorithms used differ).
When lambda is supplied, the same sequence is used everywhere, but in some GLMs can lead to convergence issues. loss to use for cross-validation. Currently five options, not all available for all models.
Lasso, or Least Absolute Shrinkage and Selection Operator, is quite similar conceptually to ridge regression. It also adds a penalty for non-zero coefficients, but unlike ridge regression which penalizes sum of squared coefficients (the so-called L2 penalty), lasso penalizes the sum of their absolute values (L1 penalty).
I had the same error in glmnet_2.0-5 It has something to do with how are the lambdas automatically created in some situations. The solution is to provide own lambdas
E.g:
cv.glmnet(x = X,
y = expanded_data$y_codings,
family="binomial",
alpha=0,
lambda=exp(seq(log(0.001), log(5), length.out=100)))
Kudos to https://github.com/lmweber/glmnet-error-example/blob/master/glmnet_error_example.R
Well, I just ran the first loop and it completed successfully. This is with glmnet 2.0.2.
This is more of a comment, but it's too big to fit: when running tests like this which depend on random numbers, you can save the seed as you go. This lets you jump to the middle of the testing without having to go back to the start each time.
Something like this:
results <- lapply(1:1000, function(x) {
seed <- .Random.seed
res <- try(glmnet(x, y, ...)) # so the code keeps running even if there's an error
attr(res, "seed") <- seed
res
})
Now you can check if any of the runs failed, by looking at the class of the results:
errs <- sapply(results, function(x) inherits(x, "try-error"))
any(errs)
And you can retry those runs that failed:
firstErr <- which(errs)[1]
.Random.seed <- attr(results[[firstErr]], "seed")
glmnet(x, y, ...) # try failed run again
Session info:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.850
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glmnetUtils_0.55 RevoUtilsMath_8.0.3 RevoUtils_8.0.3 RevoMods_8.0.3 RevoScaleR_8.0.6
[6] lattice_0.20-33 rpart_4.1-10
loaded via a namespace (and not attached):
[1] Matrix_1.2-2 parallel_3.2.2 codetools_0.2-14 rtvs_1.0.0.0 grid_3.2.2
[6] iterators_1.0.8 foreach_1.4.3 glmnet_2.0-2
(That should be Windows 10, not 8; R 3.2.2 doesn't know about Win10)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With