Why calculating MSE in lasso regression gives different outputs?

Tags:

I am trying to run different regression models on the Prostate cancer data from the lasso2 package. When I use Lasso, I saw two different methods to calculate the mean square error. But they do give me quite different results, so I would want to know if I'm doing anything wrong or if it just means that one method is better than the other ?

# Needs the following R packages.
library(lasso2)
library(glmnet)

# Gets the prostate cancer dataset
data(Prostate)

# Defines the Mean Square Error function 
mse = function(x,y) { mean((x-y)^2)}

# 75% of the sample size.
smp_size = floor(0.75 * nrow(Prostate))

# Sets the seed to make the partition reproductible.
set.seed(907)
train_ind = sample(seq_len(nrow(Prostate)), size = smp_size)

# Training set
train = Prostate[train_ind, ]

# Test set
test = Prostate[-train_ind, ]

# Creates matrices for independent and dependent variables.
xtrain = model.matrix(lpsa~. -1, data = train)
ytrain = train$lpsa
xtest = model.matrix(lpsa~. -1, data = test)
ytest = test$lpsa

# Fitting a linear model by Lasso regression on the "train" data set
pr.lasso = cv.glmnet(xtrain,ytrain,type.measure='mse',alpha=1)
lambda.lasso = pr.lasso$lambda.min

# Getting predictions on the "test" data set and calculating the mean     square error
lasso.pred = predict(pr.lasso, s = lambda.lasso, newx = xtest) 

# Calculating MSE via the mse function defined above
mse.1 = mse(lasso.pred,ytest)
cat("MSE (method 1): ", mse.1, "\n")

# Calculating MSE via the cvm attribute inside the pr.lasso object
mse.2 = pr.lasso$cvm[pr.lasso$lambda == lambda.lasso]
cat("MSE (method 2): ", mse.2, "\n")

So these are the outputs I got for both MSE:

MSE (method 1): 0.4609978 
MSE (method 2): 0.5654089

And they're quite different. Does anyone know why ? Thanks a lot in advance for your help!

Samuel

509

asked Sep 14 '16 04:09

chogall

1 Answers

As pointed out by @alistaire, in the first case you are using the test data to compute the MSE, in the second case the MSE from the cross-validation (training) folds are reported, so it's not an apples to apples comparison.

We can do something like the following to do apples to apples comparison (by keeping the fitted values on the training folds) and as we can see mse.1 and mse.2 are exactly equal if computed on the same training folds (although the value is little different from yours, with my desktop R version 3.1.2, x86_64-w64-mingw32, windows 10):

# Needs the following R packages.
library(lasso2)
library(glmnet)

# Gets the prostate cancer dataset
data(Prostate)

# Defines the Mean Square Error function 
mse = function(x,y) { mean((x-y)^2)}

# 75% of the sample size.
smp_size = floor(0.75 * nrow(Prostate))

# Sets the seed to make the partition reproductible.
set.seed(907)
train_ind = sample(seq_len(nrow(Prostate)), size = smp_size)

# Training set
train = Prostate[train_ind, ]

# Test set
test = Prostate[-train_ind, ]

# Creates matrices for independent and dependent variables.
xtrain = model.matrix(lpsa~. -1, data = train)
ytrain = train$lpsa
xtest = model.matrix(lpsa~. -1, data = test)
ytest = test$lpsa

# Fitting a linear model by Lasso regression on the "train" data set
# keep the fitted values on the training folds
pr.lasso = cv.glmnet(xtrain,ytrain,type.measure='mse', keep=TRUE, alpha=1)
lambda.lasso = pr.lasso$lambda.min
lambda.id <- which(pr.lasso$lambda == pr.lasso$lambda.min)

# get the predicted values on the training folds with lambda.min (not from test data)
mse.1 = mse(pr.lasso$fit[,lambda.id], ytrain) 
cat("MSE (method 1): ", mse.1, "\n")

MSE (method 1):  0.6044496 

# Calculating MSE via the cvm attribute inside the pr.lasso object
mse.2 = pr.lasso$cvm[pr.lasso$lambda == lambda.lasso]
cat("MSE (method 2): ", mse.2, "\n")

MSE (method 2):  0.6044496 

mse.1 == mse.2
[1] TRUE

answered Sep 23 '22 16:09

Sandipan Dey

Related questions
                            
                                How to write numpy arrays to .txt file, starting at a certain line?
                            
                                How to GROUP BY with an exception?
                            
                                RCPTT running Java code from the script
                            
                                Is there a way to combine git commands atomically (not through shell)?
                            
                                How can I set different gradients as background, each time user refreshes?
                            
                                std::string not nothrow move assignable or comparable?
                            
                                User Agent change at run time
                            
                                Set start_url in web app manifest based on the URL that was added to homescreen
                            
                                Python debuggers not stepping into a coroutine?
                            
                                Configure uploadArchives task using gradle script kotlin
                            
                                CSS focus: Is there a way to unfocus?
                            
                                How can one debug/get logs for failures of the SparkR Java backend?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With