Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a decision tree using h2o

I am trying to train a decision tree model using h2o. I am aware that no specific library for decision trees exist in h2o. But, h2o has an implemtation of random forest H2ORandomForestEstimator . Can we implement a decision tree in h2o by tuning certain input arguments of random forests ? Because we can do that in scikit module (a popular python library for machine learning)

Ref link : Why is Random Forest with a single tree much better than a Decision Tree classifier?

In scikit the code looks something like this

RandomForestClassifier(n_estimators=1, max_features=None, bootstrap=False)

Do we have a equivalant of this code in h2o ?

like image 656
ishaan arora Avatar asked Jun 07 '18 11:06

ishaan arora


People also ask

Can you build decision tree in Python?

Fortunately, it is quite easy to work with Decision Trees in Python thanks to the scikit-learn (sklearn) Python package. As you might know, sklearn is an efficient and simple library for Machine Learning that has plenty of ML algorithms, metrics, datasets, and additional tools implemented.

Is there an API for decision trees in H2O?

Let's take a look at an in-depth tutorial that explains how to inspect decision trees and explores an API that allows access to every tree-based algorithm in H2O. Join the DZone community and get the full member experience. H 2 O-3, the open-source Machine Learning platform offers several algorithms based on decision trees.

Which tree-based algorithms are available in H2O?

We’ve created unified API to access every single tree in all tree-based algorithms available in H 2 O, namely: In addition, algorithms like GBM or XGBoost may be part of Stacked Ensembles models or leveraged by AutoML.

What's new in h2o-3?

H 2 O-3, the open-source Machine Learning platform offers several algorithms based on decision trees. In the latest stable release, we’ve made it possible for data scientists and developers to inspect the trees thoroughly. We’ve created unified API to access every single tree in all tree-based algorithms available in H 2 O, namely:

What is h2otree in H2O?

When trees are fetched, H 2 O’s endpoint named /3/Tree is called. The request and response format is described by the TreeV3 class. Request handling logic is contained in the TreeHandler class. When H2OTree is constructed, H 2 O backend provides the tree structure to the client in a semi-compressed format and the client builds the representation.


2 Answers

you can use H2O's random forest (H2ORandomForestEstimator), set ntrees=1 so that it only builds one tree, set mtries to the number of features (i.e. columns) you have in your dataset and sample_rate =1. Setting mtries to the number of features in your dataset means the algo will randomly sample from all of your features at each level in the decision tree.

here is more information about mtries:http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/mtries.html

like image 88
Lauren Avatar answered Oct 31 '22 21:10

Lauren


To add to Lauren's answer: based on PUBDEV-4324 - Expose Decision Tree as a stand-alone algo in H2O both DRF and GBM can do the job with GBM being marginally easier:

titanic_1tree = h2o.gbm(x = predictors, y = response, 
                        training_frame = titanicHex,
                        ntrees = 1, min_rows = 1, sample_rate = 1,            
                        col_sample_rate = 1,
                        max_depth = 5,
                        seed = 1)

which creates a decision tree maximum 5 splits deep (max_depth = 5) on titanic dataset (available here: https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv)

Starting with release 3.22.0.1 (Xia) it's possible to extract tree structures from H2O models:

titanicH2oTree = h2o.getModelTree(model = titanic_1tree, tree_number = 1)
like image 25
topchef Avatar answered Oct 31 '22 19:10

topchef