I am using H2O (Python) where I am playing with H2OGridSearch for alpha values of a GLM (H2OGeneralizedLinearEstimator), also using lambda_search=True using k-fold cross-validation.
How can I get the best model's lambda value?
EDIT: Fully reproducible example
Data:
34.40 17:1 73:1 127:1 265:1 912:1 1162:1 1512:1 1556:1 1632:1 1738:1
205.10 127:1 138:1 338:1 347:1 883:1 912:1 1120:1 1122:1 1512:1
7.75 66:1 127:1 347:1 602:1 1422:1 1512:1 1535:1 1738:1
8.85 127:1 608:1 906:1 979:1 1077:1 1512:1 1738:1
51.80 127:1 347:1 608:1 766:1 912:1 928:1 952:1 1034:1 1512:1 1610:1 1738:1
110.00 127:1 229:1 347:1 602:1 608:1 1171:1 1512:1 1718:1
8.90 66:1 127:1 205:1 347:1 490:1 589:1 912:1 1016:1 1512:1
Call this file h2o_example.svmlight
Then run:
h2o_data = h2o.import_file("h2o_example.svmlight")
cols = h2o_data.columns[1:]
hyper_parameters = {"alpha": [0.0, 0.01, 0.99, 1.0]}
grid = H2OGridSearch(H2OGeneralizedLinearEstimator(family="gamma", link="log", lambda_search=True, nfolds=2, intercept=True, standardize=False),
hyper_params=hyper_parameters)
grid.train(y="C1", x=cols, training_frame=h2o_data)
grid_table = grid.get_grid(sort_by="r2", decreasing=True)
best = grid_table.models[0]
best.actual_params["lambda"]
best.actual_params["alpha"]
The last two commands fail, giving me an error:
TypeError: 'property' object has no attribute '__getitem__'
Apparently, I am using lambda_search in a wrong way. How can I get a single alpha and lambda value for the best model according to my criterion?
Final EDIT
There are multiple ways of getting lambda (shown below) but here are two concise ways of getting lambda.(Note fully reproducible code is at the bottom)
If you have lambda_search = True, you can look at the model summary table under the lambda_search column and see what value is set for lambda.min, which is your best lambda
model.summary()['lambda_search']
which will produce a list with a string similar to:
['nlambda = 100, lambda.max = 12.733, lambda.min = 0.05261, lambda.1se = -1.0']
if you don't use lambda search and don't set a lambda value (or do set it) you can also use the summary table
model.summary()['regularization']
output looks like:
['Elastic Net (alpha = 0.5, lambda = 0.01289 )']
Other options:
look at the actual parameters of the model:
best.actual_params['lambda']
best.actual_params['alpha']
where best was your best model in the grid search results
First EDIT
to get the best model you can do
grid_table = grid.get_grid(sort_by='r2', decreasing=True)
best = grid_table.models[0]
Then you can use:
best.actual_params['lambda']
Fully reproducible example
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()
# import the airlines dataset:
# This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
# original data can be found at http://www.transtats.bts.gov/
airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")
# convert columns to factors
airlines["Year"]= airlines["Year"].asfactor()
airlines["Month"]= airlines["Month"].asfactor()
airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor()
airlines["Cancelled"] = airlines["Cancelled"].asfactor()
airlines['FlightNum'] = airlines['FlightNum'].asfactor()
# set the predictor names and the response column name
predictors = ["Origin", "Dest", "Year", "UniqueCarrier", "DayOfWeek", "Month", "Distance", "FlightNum"]
response = "IsDepDelayed"
# split into train and validation sets
train, valid= airlines.split_frame(ratios = [.8])
# try using the `lambda_` parameter:
# initialize your estimator
airlines_glm = H2OGeneralizedLinearEstimator(family = 'binomial', lambda_ = .0001)
# then train your model
airlines_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
# print the auc for the validation data
print(airlines_glm.auc(valid=True))
# Example of values to grid over for `lambda`
# import Grid Search
from h2o.grid.grid_search import H2OGridSearch
# select the values for lambda_ to grid over
hyper_params = {'lambda': [1, 0.5, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0]}
# this example uses cartesian grid search because the search space is small
# and we want to see the performance of all models. For a larger search space use
# random grid search instead: {'strategy': "RandomDiscrete"}
# initialize the glm estimator
airlines_glm_2 = H2OGeneralizedLinearEstimator(family = 'binomial')
# build grid search with previously made GLM and hyperparameters
grid = H2OGridSearch(model = airlines_glm_2, hyper_params = hyper_params,
search_criteria = {'strategy': "Cartesian"})
# train using the grid
grid.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
# sort the grid models by decreasing AUC
grid_table = grid.get_grid(sort_by = 'auc', decreasing = True)
print(grid_table)
best = grid_table.models[0]
print(best.actual_params['lambda'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With