Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize parameters using genetic algorithms

I'd like to optimize three parameters (gamma, cost and epsilon) in eps-regression (SVR) using GA in R. Here's what I've done.

library(e1071)
data(Ozone, package="mlbench")
a<-na.omit(Ozone)
index<-sample(1:nrow(a), trunc(nrow(a)/3))
trainset<-a[index,]
testset<-a[-index,]
model<-svm(V4 ~ .,data=trainset, cost=0.1, gamma=0.1, epsilon=0.1, type="eps-regression", kernel="radial")
error<-model$residuals
rmse <- function(error) #root mean sqaured error
{
  sqrt(mean(error^2))
}
rmse(error)

Here, I set cost, gamma and epsilon to be 0.1 respectively, but I don't think they are the best value. So, I'd like to employ Genetic Algorithm to optimize these parameters.

GA <- ga(type = "real-valued", fitness = rmse,
         min = c(0.1,3), max = c(0.1,3),
         popSize = 50, maxiter = 100)

Here, I used RMSE as the fitness function. but I think fitness function has to include the parameters that are to be optimized. But, in SVR, the objective function is too complicated to write out with R code, which I tried to find for a LONG time but to no avail. Someone who knows SVR and GA at the same time, someone who has an experience of optimizing SVR parameters using GA, Please help me. please.

like image 951
jihoon Avatar asked Aug 15 '15 15:08

jihoon


1 Answers

In such an application, one passes the parameters whose values are to be optimized (in your case, cost, gamma and epsilon) as parameters of the fitness function, which then runs the model fitting + evaluation function and uses a measure of model performance as a measure of fitness. Therefore, the explicit form of the objective function is not directly relevant.

In the implementation below, I used 5-fold cross-validation to estimate the RMSE for a given set of parameters. In particular, since package GA maximizes the fitness function, I have written the fitness value for a given value of the parameters as minus the average rmse over the cross-validation datasets. Hence, the maximum fitness that can be attained is zero.

Here it is:

library(e1071)
library(GA)

data(Ozone, package="mlbench")
Data <- na.omit(Ozone)

# Setup the data for cross-validation
K = 5 # 5-fold cross-validation
fold_inds <- sample(1:K, nrow(Data), replace = TRUE)
lst_CV_data <- lapply(1:K, function(i) list(
    train_data = Data[fold_inds != i, , drop = FALSE], 
    test_data = Data[fold_inds == i, , drop = FALSE]))

# Given the values of parameters 'cost', 'gamma' and 'epsilon', return the rmse of the model over the test data
evalParams <- function(train_data, test_data, cost, gamma, epsilon) {
    # Train
    model <- svm(V4 ~ ., data = train_data, cost = cost, gamma = gamma, epsilon = epsilon, type = "eps-regression", kernel = "radial")
    # Test
    rmse <- mean((predict(model, newdata = test_data) - test_data$V4) ^ 2)
    return (rmse)
}

# Fitness function (to be maximized)
# Parameter vector x is: (cost, gamma, epsilon)
fitnessFunc <- function(x, Lst_CV_Data) {
    # Retrieve the SVM parameters
    cost_val <- x[1]
    gamma_val <- x[2]
    epsilon_val <- x[3]

    # Use cross-validation to estimate the RMSE for each split of the dataset
    rmse_vals <- sapply(Lst_CV_Data, function(in_data) with(in_data, 
        evalParams(train_data, test_data, cost_val, gamma_val, epsilon_val)))

    # As fitness measure, return minus the average rmse (over the cross-validation folds), 
    # so that by maximizing fitness we are minimizing the rmse
    return (-mean(rmse_vals))
}

# Range of the parameter values to be tested
# Parameters are: (cost, gamma, epsilon)
theta_min <- c(cost = 1e-4, gamma = 1e-3, epsilon = 1e-2)
theta_max <- c(cost = 10, gamma = 2, epsilon = 2)

# Run the genetic algorithm
results <- ga(type = "real-valued", fitness = fitnessFunc, lst_CV_data, 
    names = names(theta_min), 
    min = theta_min, max = theta_max,
    popSize = 50, maxiter = 10)

summary(results)

which produces the results (for the range of parameter values that I specified, which may require fine-tuning based on the data):

GA results: 
Iterations             = 100 
Fitness function value = -14.66315 
Solution               = 
         cost      gamma    epsilon
[1,] 2.643109 0.07910103 0.09864132
like image 138
tguzella Avatar answered Nov 16 '22 00:11

tguzella