I am trying to train a decision tree model using h2o. I am aware that no specific library for decision trees exist in h2o. But, h2o has an implemtation of random forest H2ORandomForestEstimator . Can we implement a decision tree in h2o by tuning certain input arguments of random forests ? Because we can do that in scikit module (a popular python library for machine learning)
Ref link : Why is Random Forest with a single tree much better than a Decision Tree classifier?
In scikit the code looks something like this
RandomForestClassifier(n_estimators=1, max_features=None, bootstrap=False)
Do we have a equivalant of this code in h2o ?
Fortunately, it is quite easy to work with Decision Trees in Python thanks to the scikit-learn (sklearn) Python package. As you might know, sklearn is an efficient and simple library for Machine Learning that has plenty of ML algorithms, metrics, datasets, and additional tools implemented.
Let's take a look at an in-depth tutorial that explains how to inspect decision trees and explores an API that allows access to every tree-based algorithm in H2O. Join the DZone community and get the full member experience. H 2 O-3, the open-source Machine Learning platform offers several algorithms based on decision trees.
We’ve created unified API to access every single tree in all tree-based algorithms available in H 2 O, namely: In addition, algorithms like GBM or XGBoost may be part of Stacked Ensembles models or leveraged by AutoML.
H 2 O-3, the open-source Machine Learning platform offers several algorithms based on decision trees. In the latest stable release, we’ve made it possible for data scientists and developers to inspect the trees thoroughly. We’ve created unified API to access every single tree in all tree-based algorithms available in H 2 O, namely:
When trees are fetched, H 2 O’s endpoint named /3/Tree is called. The request and response format is described by the TreeV3 class. Request handling logic is contained in the TreeHandler class. When H2OTree is constructed, H 2 O backend provides the tree structure to the client in a semi-compressed format and the client builds the representation.
you can use H2O's random forest (H2ORandomForestEstimator
), set ntrees=1
so that it only builds one tree, set mtries
to the number of features (i.e. columns) you have in your dataset and sample_rate =1
. Setting mtries
to the number of features in your dataset means the algo will randomly sample from all of your features at each level in the decision tree.
here is more information about mtries
:http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/mtries.html
To add to Lauren's answer: based on PUBDEV-4324 - Expose Decision Tree as a stand-alone algo in H2O both DRF and GBM can do the job with GBM being marginally easier:
titanic_1tree = h2o.gbm(x = predictors, y = response,
training_frame = titanicHex,
ntrees = 1, min_rows = 1, sample_rate = 1,
col_sample_rate = 1,
max_depth = 5,
seed = 1)
which creates a decision tree maximum 5 splits deep (max_depth = 5) on titanic dataset (available here: https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv)
Starting with release 3.22.0.1 (Xia) it's possible to extract tree structures from H2O models:
titanicH2oTree = h2o.getModelTree(model = titanic_1tree, tree_number = 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With