What features of xgboost are affected by seed (random_state)?

Question

The Python API doesn't give much more information other than that the seed= parameter is passed to numpy.random.seed:

seed (int) – Seed used to generate the folds (passed to numpy.random.seed).

But what features of xgboost use numpy.random.seed?

Running xgboost with all default settings still produces the same performance even when altering the seed.
I have already been able to verify colsample_bytree does so; different seeds yield different performance.
I have been told it is also used by subsample and the other colsample_* features, which seems plausible since any form of sampling requires randomness.

What other features of xgboost rely on numpy.random.seed?

Sergey Bushmanov · Accepted Answer

Boosted trees are grown sequentially, with tree growth within one iteration being distributed among threads. To avoid overfitting, randomness is induced through the following params:

colsample_bytree
colsample_bylevel
colsample_bynode
subsample (note the *sample* pattern)
shuffle in CV folder creation for cross validation

In addition, you may encounter non-determinism, not controlled by random state, in the following places:

[GPU] histogram building is not deterministic due to the nonassociative aspect of floating point summation.

Using gblinear booster with shotgun updater is nondeterministic as it uses Hogwild algorithm

when using GPU ranking objective, the result is not deterministic due to the non-associative aspect of floating point summation.

Comment Re: how you know this?

For this to know it's helpful:

To be aware of how trees are grown: Demystify Modern Gradient Boosting Trees (references may be also helpful)
Scanning documentation full text for the terms of interest: random, sample, deterministic, determinism etc.
Lastly (firstly?), knowing why you need sampling and similar cases from counterparts like bagged trees (RANDOM FORESTS by Leo Breiman) and neural networks (Deep learning with Python by François Chollet, chapter on overfitting) may also be helpful.

What features of xgboost are affected by seed (random_state)?

Tags:

python

numpy

random-seed

xgboost

jorijnsmit

1 Answers

Sergey Bushmanov

Recent Activity

Donate For Us

What features of xgboost are affected by seed (random_state)?

Tags:

python

numpy

random-seed

xgboost

jorijnsmit

1 Answers

Sergey Bushmanov

Related questions

Recent Activity

Donate For Us