Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

h2o GLM GridSearch lambda value

Tags:

python

glm

h2o

I am using H2O (Python) where I am playing with H2OGridSearch for alpha values of a GLM (H2OGeneralizedLinearEstimator), also using lambda_search=True using k-fold cross-validation.

How can I get the best model's lambda value?

EDIT: Fully reproducible example

Data:

34.40 17:1 73:1 127:1 265:1 912:1 1162:1 1512:1 1556:1 1632:1 1738:1
205.10 127:1 138:1 338:1 347:1 883:1 912:1 1120:1 1122:1 1512:1
7.75 66:1 127:1 347:1 602:1 1422:1 1512:1 1535:1 1738:1
8.85 127:1 608:1 906:1 979:1 1077:1 1512:1 1738:1
51.80 127:1 347:1 608:1 766:1 912:1 928:1 952:1 1034:1 1512:1 1610:1 1738:1
110.00 127:1 229:1 347:1 602:1 608:1 1171:1 1512:1 1718:1
8.90 66:1 127:1 205:1 347:1 490:1 589:1 912:1 1016:1 1512:1

Call this file h2o_example.svmlight

Then run:

h2o_data = h2o.import_file("h2o_example.svmlight")
cols = h2o_data.columns[1:]
hyper_parameters = {"alpha": [0.0, 0.01, 0.99, 1.0]}
grid = H2OGridSearch(H2OGeneralizedLinearEstimator(family="gamma", link="log", lambda_search=True, nfolds=2, intercept=True, standardize=False),
hyper_params=hyper_parameters)
grid.train(y="C1", x=cols, training_frame=h2o_data)
grid_table = grid.get_grid(sort_by="r2", decreasing=True)
best = grid_table.models[0]
best.actual_params["lambda"]
best.actual_params["alpha"]

The last two commands fail, giving me an error:

TypeError: 'property' object has no attribute '__getitem__'

Apparently, I am using lambda_search in a wrong way. How can I get a single alpha and lambda value for the best model according to my criterion?

like image 740
user90772 Avatar asked Jun 19 '26 15:06

user90772


1 Answers

Final EDIT

There are multiple ways of getting lambda (shown below) but here are two concise ways of getting lambda.(Note fully reproducible code is at the bottom)

If you have lambda_search = True, you can look at the model summary table under the lambda_search column and see what value is set for lambda.min, which is your best lambda

model.summary()['lambda_search']

which will produce a list with a string similar to:

['nlambda = 100, lambda.max = 12.733, lambda.min = 0.05261, lambda.1se = -1.0']

if you don't use lambda search and don't set a lambda value (or do set it) you can also use the summary table

model.summary()['regularization']

output looks like:

['Elastic Net (alpha = 0.5, lambda = 0.01289 )']

Other options:

look at the actual parameters of the model: best.actual_params['lambda'] best.actual_params['alpha']

where best was your best model in the grid search results

First EDIT

to get the best model you can do

grid_table = grid.get_grid(sort_by='r2', decreasing=True)
best = grid_table.models[0]

Then you can use:

best.actual_params['lambda']

Fully reproducible example

import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
h2o.init()

# import the airlines dataset:
# This dataset is used to classify whether a flight will be delayed 'YES' or not "NO"
# original data can be found at http://www.transtats.bts.gov/
airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")

# convert columns to factors
airlines["Year"]= airlines["Year"].asfactor()
airlines["Month"]= airlines["Month"].asfactor()
airlines["DayOfWeek"] = airlines["DayOfWeek"].asfactor()
airlines["Cancelled"] = airlines["Cancelled"].asfactor()
airlines['FlightNum'] = airlines['FlightNum'].asfactor()

# set the predictor names and the response column name
predictors = ["Origin", "Dest", "Year", "UniqueCarrier", "DayOfWeek", "Month", "Distance", "FlightNum"]
response = "IsDepDelayed"

# split into train and validation sets
train, valid= airlines.split_frame(ratios = [.8])

# try using the `lambda_` parameter:
# initialize your estimator
airlines_glm = H2OGeneralizedLinearEstimator(family = 'binomial', lambda_ = .0001)

# then train your model
airlines_glm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# print the auc for the validation data
print(airlines_glm.auc(valid=True))


# Example of values to grid over for `lambda`
# import Grid Search
from h2o.grid.grid_search import H2OGridSearch

# select the values for lambda_ to grid over
hyper_params = {'lambda': [1, 0.5, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0]}

# this example uses cartesian grid search because the search space is small
# and we want to see the performance of all models. For a larger search space use
# random grid search instead: {'strategy': "RandomDiscrete"}
# initialize the glm estimator
airlines_glm_2 = H2OGeneralizedLinearEstimator(family = 'binomial')

# build grid search with previously made GLM and hyperparameters
grid = H2OGridSearch(model = airlines_glm_2, hyper_params = hyper_params,
                     search_criteria = {'strategy': "Cartesian"})

# train using the grid
grid.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

# sort the grid models by decreasing AUC
grid_table = grid.get_grid(sort_by = 'auc', decreasing = True)
print(grid_table)

best = grid_table.models[0]
print(best.actual_params['lambda'])
like image 148
Lauren Avatar answered Jun 21 '26 05:06

Lauren