cv.glmnet fails for ridge, not lasso, for simulated data with coder error

Gist

The error: Error in predmat[which, seq(nlami)] = preds : replacement has length zero

The context: data is simulated with a binary y, but there are n coders of true y. the data is stacked n times and a model is fitted, trying to get true y.

The error is received for

L2 penalty, but not L1 penalty.
when Y is the coder Y, but not when it is the true Y.
the error is not deterministic, but depends on seed.

UPDATE: the error is for versions after 1.9-8. 1.9-8 does not fail.

Reproduction

base data:

library(glmnet)
rm(list=ls())
set.seed(123)

num_obs=4000
n_coders=2
precision=.8

X <- matrix(rnorm(num_obs*20, sd=1), nrow=num_obs)
prob1 <- plogis(X %*% c(2, -2, 1, -1, rep(0, 16))) # yes many zeros, ignore
y_true <- rbinom(num_obs, 1, prob1)
dat <- data.frame(y_true = y_true, X = X)

create coders

classify <- function(true_y,precision){
  n=length(true_y)
  y_coder <- numeric(n)
  y_coder[which(true_y==1)] <- rbinom(n=length(which(true_y==1)),
                                      size=1,prob=precision)
  y_coder[which(true_y==0)] <- rbinom(n=length(which(true_y==0)),
                                      size=1,prob=(1-precision))
  return(y_coder)
}
y_codings <- sapply(rep(precision,n_coders),classify,true_y = dat$y_true)

stack it all

expanded_data <- do.call(rbind,rep(list(dat),n_coders))
expanded_data$y_codings <- matrix(y_codings, ncol = 1)

reproduce error

Since the error depends on seed, a loop is necessary. only the first loop will fail, the other two will finish.

X <- as.matrix(expanded_data[,grep("X",names(expanded_data))])

for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings,
                            family="binomial", alpha=0)  # will fail
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings,
                            family="binomial", alpha=1)  # will not fail
for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_true,
                            family="binomial", alpha=0)  # will not fail

Any thoughts where this is coming from in glmnet and how to avoid it? from my reading of cv.glmnet, this is after the cv routine and is inside cvstuff = do.call(fun, list(outlist, lambda, x, y, weights, offset, foldid, type.measure, grouped, keep)), which I do not understand its role, hence the failure, and how to avoid it.

sessions (Ubuntu and PC)

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] glmnet_2.0-2    foreach_1.4.3   Matrix_1.2-7.1  devtools_1.12.0

loaded via a namespace (and not attached):
 [1] httr_1.2.1       R6_2.2.0         tools_3.3.1      withr_1.0.2      curl_2.1        
 [6] memoise_1.0.0    codetools_0.2-15 grid_3.3.1       iterators_1.0.8  knitr_1.14      
[11] digest_0.6.10    lattice_0.20-34

and

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] glmnet_2.0-2    foreach_1.4.3   Matrix_1.2-7.1  devtools_1.12.0

loaded via a namespace (and not attached):
 [1] httr_1.2.1       R6_2.2.0         tools_3.3.1      withr_1.0.2      curl_2.1        
 [6] memoise_1.0.0    codetools_0.2-15 grid_3.3.1       iterators_1.0.8  digest_0.6.10   
[11] lattice_0.20-34

870

asked Oct 20 '16 03:10

Elad663

2 Answers

I had the same error in glmnet_2.0-5 It has something to do with how are the lambdas automatically created in some situations. The solution is to provide own lambdas

E.g:

cv.glmnet(x = X,
          y = expanded_data$y_codings,
          family="binomial", 
          alpha=0,
          lambda=exp(seq(log(0.001), log(5), length.out=100)))

Kudos to https://github.com/lmweber/glmnet-error-example/blob/master/glmnet_error_example.R

answered Oct 16 '22 01:10

user2173836

Well, I just ran the first loop and it completed successfully. This is with glmnet 2.0.2.

This is more of a comment, but it's too big to fit: when running tests like this which depend on random numbers, you can save the seed as you go. This lets you jump to the middle of the testing without having to go back to the start each time.

Something like this:

results <- lapply(1:1000, function(x) {
    seed <- .Random.seed
    res <- try(glmnet(x, y, ...))  # so the code keeps running even if there's an error
    attr(res, "seed") <- seed
    res
})

Now you can check if any of the runs failed, by looking at the class of the results:

errs <- sapply(results, function(x) inherits(x, "try-error"))
any(errs)

And you can retry those runs that failed:

firstErr <- which(errs)[1]
.Random.seed <- attr(results[[firstErr]], "seed")
glmnet(x, y, ...)  # try failed run again

Session info:

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.850    
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] glmnetUtils_0.55    RevoUtilsMath_8.0.3 RevoUtils_8.0.3     RevoMods_8.0.3      RevoScaleR_8.0.6   
[6] lattice_0.20-33     rpart_4.1-10       

loaded via a namespace (and not attached):
[1] Matrix_1.2-2     parallel_3.2.2   codetools_0.2-14 rtvs_1.0.0.0     grid_3.2.2      
[6] iterators_1.0.8  foreach_1.4.3    glmnet_2.0-2

(That should be Windows 10, not 8; R 3.2.2 doesn't know about Win10)

answered Oct 16 '22 01:10

Hong Ooi

Related questions
                            
                                Generating and Summing Matrix
                            
                                How can I click a link in a webpage in Rselenium?
                            
                                Is there a way to use 32-bit float instead of 64-bit in R dataframes?
                            
                                Transporting Sparse Matrix from Python to R
                            
                                ifelse() stripping POSIXct attribute from vector of timestamps?
                            
                                Figure name in caption using RMarkdown
                            
                                creating sequence of dates for each group in r
                            
                                How to prep transaction data into basket for arules
                            
                                Sankey diagram in R
                            
                                Setting NA in a matrix using another logical matrix
                            
                                horizontal ggplot2::geom_violin without coord_flip
                            
                                Creating internal functions (can't be called from console) in R
                            
                                Why does data.table get copied when adding a new column?
                            
                                XGBoost - Poisson distribution with varying exposure / offset
                            
                                Differing quantiles: Boxplot vs. Violinplot
                            
                                Draw curved lines in ggmap, geom_curve not working
                            
                                R, R6 Operator Overloading
                            
                                Stochastic gradient descent from gradient descent implementation in R
                            
                                How to make label and box align next to each other in shiny::numericInput?
                            
                                Drawing decision boundaries in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With