I know that seed is set in general is used so that we can reproduce the same result. But, what does setting up the seed actually do in random forest part. Does it change any of the arguments of randomForest()
function in R like nTree
or sampSize
.
I am using different seeds for my random forest model each time, but want to know how different seeds affect a random forest model.
Seed function is used to save the state of a random function, so that it can generate same random numbers on multiple executions of the code on the same machine or on different machines (for a specific seed value). The seed value is the previous value number generated by the generator.
set. seed=500 initializes a variable called set. seed and sets it to 500. It does not set the random number generator seed. Use set.
It's a pop-culture reference! In Douglas Adams's popular 1979 science-fiction novel The Hitchhiker's Guide to the Galaxy, towards the end of the book, the supercomputer Deep Thought reveals that the answer to the great question of “life, the universe and everything” is 42.
When you use statistical software to generate random numbers, you usually have an option to specify a random number seed. A seed is a positive integer that initializes a random-number generator (technically, a pseudorandom-number generator). A seed enables you to create reproducible streams of random numbers.
Trees grow from seeds and so do forests ;-) (scnr)
There are different ways to built a random forest, however, all in common is that multiple trees are built. To improve classification accuracy over a single decision tree, the individual trees in a random forest need to differ, as you would have nTree
times the same tree. This difference is achieved by introducing randomness in the generation of the trees. The randomness is influenced by the seed and what is most important about the seed is that using the same seed should always generate the same result.
How does the randomness influence the tree build? There are multiple ways. - build the tree for a random subset. This is for each individual tree of the forest a subset of training example are drawn and then a tree is build for this subset - at each decision point in the tree, the decision attribute is selected randomly.
Often these two elements are combined.
http://link.springer.com/article/10.1023%2FA%3A1010933404324#page-1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With