Implementing a decision tree using h2o

Tags:

I am trying to train a decision tree model using h2o. I am aware that no specific library for decision trees exist in h2o. But, h2o has an implemtation of random forest H2ORandomForestEstimator . Can we implement a decision tree in h2o by tuning certain input arguments of random forests ? Because we can do that in scikit module (a popular python library for machine learning)

Ref link : Why is Random Forest with a single tree much better than a Decision Tree classifier?

In scikit the code looks something like this

RandomForestClassifier(n_estimators=1, max_features=None, bootstrap=False)

Do we have a equivalant of this code in h2o ?

656

asked Jun 07 '18 11:06

ishaan arora

2 Answers

you can use H2O's random forest (H2ORandomForestEstimator), set ntrees=1 so that it only builds one tree, set mtries to the number of features (i.e. columns) you have in your dataset and sample_rate =1. Setting mtries to the number of features in your dataset means the algo will randomly sample from all of your features at each level in the decision tree.

here is more information about mtries:http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/mtries.html

answered Oct 31 '22 21:10

Lauren

To add to Lauren's answer: based on PUBDEV-4324 - Expose Decision Tree as a stand-alone algo in H2O both DRF and GBM can do the job with GBM being marginally easier:

titanic_1tree = h2o.gbm(x = predictors, y = response, 
                        training_frame = titanicHex,
                        ntrees = 1, min_rows = 1, sample_rate = 1,            
                        col_sample_rate = 1,
                        max_depth = 5,
                        seed = 1)

which creates a decision tree maximum 5 splits deep (max_depth = 5) on titanic dataset (available here: https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv)

Starting with release 3.22.0.1 (Xia) it's possible to extract tree structures from H2O models:

titanicH2oTree = h2o.getModelTree(model = titanic_1tree, tree_number = 1)

answered Oct 31 '22 19:10

topchef

Related questions
                            
                                Check if 2d array exists in 3d array in Python?
                            
                                Replacing all negative values in certain columns by another value in Pandas
                            
                                Delete pandas column if column name begins with a number
                            
                                Dart - Base64 string is not equal to python
                            
                                Check if string column last characters are numbers in Pandas
                            
                                spyder IDE - make variable explorer to follow the color scheme of the Editor
                            
                                Computing jacobian matrix in Tensorflow
                            
                                Python: The implementation of im2col which takes the advantages of 6 dimensional array?
                            
                                Prevent Kivy leaving debug messages
                            
                                Numpy division by 0 workaround
                            
                                pandas to_excel() ignore/allow duplicate column names
                            
                                Django REST how to set throttle period to allow one request in 10 minutes?
                            
                                Convert a dictionary of nested lists to a pandas DataFrame
                            
                                Extracting the file extensions from file names in pandas
                            
                                Returning the actual index value of max & min values from a Pandas Dataframe column
                            
                                Parsing yaml file and getting a dictionary
                            
                                imgaug: load and save images
                            
                                Parsing Google Scholar results with Python and BeautifulSoup
                            
                                Build matrices from block matrices in SymPy
                            
                                How can you remove superset lists from a list of lists in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Implementing a decision tree using h2o

Tags:

python

machine-learning

scikit-learn

decision-tree

h2o

ishaan arora

People also ask

2 Answers

Lauren

topchef

Recent Activity

Donate For Us