Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R caret model evaluation with alternate performance metric

I'm using R's caret package to do some grid search and model evaluation. I have a custom evaluation metric that is a weighted average of absolute error. Weights are assigned at the observation level.

X <- c(1,1,2,0,1) #feature 1
w <- c(1,2,2,1,1) #weights
Y <- 1:5 #target, continuous

#assume I run a model using X as features and Y as target and get a vector of predictions

mymetric <- function(predictions, target, weights){

v <- sum(abs(target-predictions)*weights)/sum(weights) 
return(v)
}

Here an example is given on how to use summaryFunction to define a custom evaluation metric for caret's train(). To quote:

The trainControl function has a argument called summaryFunction that specifies a function for computing performance. The function should have these arguments:

data is a reference for a data frame or matrix with columns called obs and pred for the observed and predicted outcome values (either numeric data for regression or character values for classification). Currently, class probabilities are not passed to the function. The values in data are the held-out predictions (and their associated reference values) for a single combination of tuning parameters. If the classProbs argument of the trainControl object is set to TRUE, additional columns in data will be present that contains the class probabilities. The names of these columns are the same as the class levels. lev is a character string that has the outcome factor levels taken from the training data. For regression, a value of NULL is passed into the function. model is a character string for the model being used (i.e. the value passed to the method argument of train).

I cannot quite figure out how to pass the observation weights to summaryFunction.

like image 411
ADJ Avatar asked Apr 14 '14 16:04

ADJ


People also ask

What is the use of trainControl () method?

4 The trainControl Function. The function trainControl generates parameters that further control how models are created, with possible values: method : The resampling method: "boot" , "cv" , "LOOCV" , "LGOCV" , "repeatedcv" , "timeslice" , "none" and "oob" .

What is tuneGrid R?

# The tuneGrid parameter lets us decide which values the main parameter will take # While tuneLength only limit the number of default parameters to use.

How does varImp work in R?

The varImp function tracks the changes in model statistics, such as the GCV, for each predictor and accumulates the reduction in the statistic when each predictor's feature is added to the model. This total reduction is used as the variable importance measure.

What does train () do in R?

Description. This function sets up a grid of tuning parameters for a number of classification and regression routines, fits each model and calculates a resampling based performance measure.


1 Answers

You can't pass in weights directly to the summary function, which is an oversight since you can pass them to the modeling function. If the underlying model accommodates weights, they are used to produce the predicted values.

I'll add that to the next release.

Max

like image 69
topepo Avatar answered Sep 23 '22 22:09

topepo