I'd like to optimize three parameters (gamma, cost and epsilon) in eps-regression (SVR) using GA in R. Here's what I've done.
library(e1071)
data(Ozone, package="mlbench")
a<-na.omit(Ozone)
index<-sample(1:nrow(a), trunc(nrow(a)/3))
trainset<-a[index,]
testset<-a[-index,]
model<-svm(V4 ~ .,data=trainset, cost=0.1, gamma=0.1, epsilon=0.1, type="eps-regression", kernel="radial")
error<-model$residuals
rmse <- function(error) #root mean sqaured error
{
sqrt(mean(error^2))
}
rmse(error)
Here, I set cost, gamma and epsilon to be 0.1 respectively, but I don't think they are the best value. So, I'd like to employ Genetic Algorithm to optimize these parameters.
GA <- ga(type = "real-valued", fitness = rmse,
min = c(0.1,3), max = c(0.1,3),
popSize = 50, maxiter = 100)
Here, I used RMSE as the fitness function. but I think fitness function has to include the parameters that are to be optimized. But, in SVR, the objective function is too complicated to write out with R code, which I tried to find for a LONG time but to no avail. Someone who knows SVR and GA at the same time, someone who has an experience of optimizing SVR parameters using GA, Please help me. please.
In such an application, one passes the parameters whose values are to be optimized (in your case, cost
, gamma
and epsilon
) as parameters of the fitness function, which then runs the model fitting + evaluation function and uses a measure of model performance as a measure of fitness. Therefore, the explicit form of the objective function is not directly relevant.
In the implementation below, I used 5-fold cross-validation to estimate the RMSE for a given set of parameters. In particular, since package GA
maximizes the fitness function, I have written the fitness value for a given value of the parameters as minus the average rmse over the cross-validation datasets. Hence, the maximum fitness that can be attained is zero.
Here it is:
library(e1071)
library(GA)
data(Ozone, package="mlbench")
Data <- na.omit(Ozone)
# Setup the data for cross-validation
K = 5 # 5-fold cross-validation
fold_inds <- sample(1:K, nrow(Data), replace = TRUE)
lst_CV_data <- lapply(1:K, function(i) list(
train_data = Data[fold_inds != i, , drop = FALSE],
test_data = Data[fold_inds == i, , drop = FALSE]))
# Given the values of parameters 'cost', 'gamma' and 'epsilon', return the rmse of the model over the test data
evalParams <- function(train_data, test_data, cost, gamma, epsilon) {
# Train
model <- svm(V4 ~ ., data = train_data, cost = cost, gamma = gamma, epsilon = epsilon, type = "eps-regression", kernel = "radial")
# Test
rmse <- mean((predict(model, newdata = test_data) - test_data$V4) ^ 2)
return (rmse)
}
# Fitness function (to be maximized)
# Parameter vector x is: (cost, gamma, epsilon)
fitnessFunc <- function(x, Lst_CV_Data) {
# Retrieve the SVM parameters
cost_val <- x[1]
gamma_val <- x[2]
epsilon_val <- x[3]
# Use cross-validation to estimate the RMSE for each split of the dataset
rmse_vals <- sapply(Lst_CV_Data, function(in_data) with(in_data,
evalParams(train_data, test_data, cost_val, gamma_val, epsilon_val)))
# As fitness measure, return minus the average rmse (over the cross-validation folds),
# so that by maximizing fitness we are minimizing the rmse
return (-mean(rmse_vals))
}
# Range of the parameter values to be tested
# Parameters are: (cost, gamma, epsilon)
theta_min <- c(cost = 1e-4, gamma = 1e-3, epsilon = 1e-2)
theta_max <- c(cost = 10, gamma = 2, epsilon = 2)
# Run the genetic algorithm
results <- ga(type = "real-valued", fitness = fitnessFunc, lst_CV_data,
names = names(theta_min),
min = theta_min, max = theta_max,
popSize = 50, maxiter = 10)
summary(results)
which produces the results (for the range of parameter values that I specified, which may require fine-tuning based on the data):
GA results:
Iterations = 100
Fitness function value = -14.66315
Solution =
cost gamma epsilon
[1,] 2.643109 0.07910103 0.09864132
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With