Following up on Pass rows of a data frame as arguments to a function in R with column names specifying the arguments:
I want to train the following model with different combinations of parameters:
library(xgboost)
library(Matrix)
df <- data.frame(y = sample(0:1, 1000, replace = TRUE),
a = rnorm(1000),
b = rnorm(1000),
c = rnorm(1000),
d = rnorm(1000))
train <- sparse.model.matrix(object = y~.-1, data = df)
model <- xgboost(data = train,
label = df$y,
# parameters
nrounds = 10,
subsample = 0.5,
colsample_bytree = 0.8)
I created a grid with the parameters and I want to pass the rows of the grid into the xgboost
function, while keeping data
and label
arguments constant.
param <- expand.grid(nrounds = c(10, 50, 100),
subsample = c(0.5, 0.8, 0.9),
colsample_bytree = c(0.8))
I would like to pass the arguments using the column names to specify them (if the column names is not an option, the order of the columns will do it as well), since this would make the call scalable for different functions.
Arguments are passed by value; that is, when a function is called, the parameter receives a copy of the argument's value, not its address. This rule applies to all scalar values, structures, and unions passed as arguments. Modifying a parameter does not modify the corresponding argument passed by the function call.
You can use the apply() function to apply a function to each row in a matrix or data frame in R. where: X: Name of the matrix or data frame. MARGIN: Dimension to perform operation across.
In R, you can pass a function as an argument. You can also pass function code to an argument. Then, you can assign the complete code of a function to a new object.
I had a similar problem, and looked in vain until I found it in Hadley's Advanced R. This allows you to pass on parameters as they appear in a dataframe, taking the names of columns as arguments. Read here:
https://adv-r.hadley.nz/functionals.html#pmap
So, here it is. There is a solution via purrr::pmap
. It maps parameters onto a function:
This is my own code which I recently used along with quanteda
to mess around with the Kaggle SMS Spam dataset. These are the possibilities for my parameters:
tolower <- data_frame(tolower = c(TRUE, FALSE))
stem <- data_frame(stem = c(TRUE, FALSE))
remove_punct <- data_frame(remove_punct = c(TRUE, FALSE))
This is a bonus and not necessary, but I found I needed all of the combinations of my parameters to run a Naive Bayes model. Thanks to Y J via this SO post:
expand.grid.df <- function(...) Reduce(function(...) merge(..., by=NULL), list(...))
parameters <- expand.grid.df(tolower, stem, remove_punct)
So, now my parameters look like this:
> parameters
tolower stem remove_punct
1 TRUE TRUE TRUE
2 FALSE TRUE TRUE
3 TRUE FALSE TRUE
4 FALSE FALSE TRUE
5 TRUE TRUE FALSE
6 FALSE TRUE FALSE
7 TRUE FALSE FALSE
8 FALSE FALSE FALSE
And now for the magic, passing the parameters on to my function of choice (dfm
) via pmap
:
mymodels <- pmap(parameters, dfm, x = mycorpus)
(x = mycorpus
was an extra parameter that is constant, that I want to pass on to dfm
)
Here's what I got:
> length(mymodels)
[1] 8
> mymodels[[1]]
Document-feature matrix of: 5,572 documents, 7,714 features (99.8% sparse).
Hope this helps you, or anyone else looking into this method!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With