subsample, colsample_bytree, colsample_bylevel in XGBClassifier() Python 3.x

Tags:

I've spent a good deal of time trying to find out what these "subsample", "colsample_by_tree", and "colsample_bylevel" actually did in XGBClassifier() but I can't exactly find out what they do. Can someone please explain briefly what it is they do?

Thanks!

749

asked Jun 25 '18 11:06

Pyrowomat

1 Answers

The idea of "subsample", "colsample_by_tree", and "colsample_bylevel" comes from Random Forests. In it, you build an ensemble of many trees and then group them together when making a prediction.

The "random" part happens through random sampling of the training samples for each tree (bootstrapping), and building each tree (actually each tree's node) only considering a random subset of the attributes.

In other words, for each tree in a random forest you:

Select a random sample from the dataset to train this tree;
For each node of this tree, use a random subset of the features. This avoids overfitting and decorrelates the trees.

Similarly to random forests, XGB is an ensemble of weak models that when put together give robust and accurate results. The weak models can be decision trees, which can be randomized in the same way as random forests. In this case:

"subsample" is the fraction of the training samples (randomly selected) that will be used to train each tree.
"colsample_by_tree" is the fraction of features (randomly selected) that will be used to train each tree.
"colsample_bylevel" is the fraction of features (randomly selected) that will be used in each node to train each tree.

200

answered Sep 28 '22 19:09

Álvaro Salgado

Related questions
                            
                                Python AttributeError: module 'string' has no attribute 'maketrans'
                            
                                How many times a number appears in a numpy array
                            
                                Horizontal colorbar over 2 of 3 subplots
                            
                                Linking up statements using the 'and' keyword [duplicate]
                            
                                Converting list to nested dictionary
                            
                                Retrieving config from a blueprint in Sanic app
                            
                                Check if key is missing after loading json from file in python
                            
                                Goroutines vs asyncio tasks + thread pool for CPU-bound calls
                            
                                Django UserCreationForm with one password
                            
                                Python - Gspread Request Error 401
                            
                                How do I pass a string as an argument name?
                            
                                How to get the default application mapped to a file extention in windows using Python
                            
                                db.create_all() not creating tables in Flask-SQLAclchemy
                            
                                zipimport.ZipImportError: can't decompress data; zlib not available
                            
                                Is there a t test table in python (numpy, scipy etc)?
                            
                                Error with pip install git (after switching to python 3.6)
                            
                                Alternative of send_file() in flask on Pythonanywhere?
                            
                                Write a recipe in yocto for a python application
                            
                                Spyder IDE complaining about unable to detect undefined names
                            
                                Converting png to pdf with PIL save mode error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

subsample, colsample_bytree, colsample_bylevel in XGBClassifier() Python 3.x

Tags:

python-3.x

xgboost

Pyrowomat

People also ask

1 Answers

Álvaro Salgado

Recent Activity

Donate For Us