I'm using scikit's Random Forest implementation:
sklearn.ensemble.RandomForestClassifier(n_estimators=100,
max_features="auto",
max_depth=10)
After calling rf.fit(...)
, the process's memory usage increases by 80MB, or 0.8MB per tree (I also tried many other settings with similar results. I used top
and psutil
to monitor the memory usage)
A binary tree of depth 10 should have, at most, 2^11-1 = 2047
elements, which can all be stored in one dense array, allowing the programmer to find parents and children of any given element easily.
Each element needs an index of the feature used in the split and the cut-off, or 6-16 bytes, depending on how economical the programmer is. This translates into 0.01-0.03MB per tree in my case.
Why is scikit's implementation using 20-60x as much memory to store a tree of a random forest?
If you wish to speed up your random forest, lower the number of estimators. If you want to increase the accuracy of your model, increase the number of trees. Specify the maximum number of features to be included at each node split. This depends very heavily on your dataset.
We trained a random forest model using 300 million instances: Spark took 37 minutes on a 20-node CPU cluster, whereas RAPIDS took 1 second on a 20-node GPU cluster. That's over 2000x faster with GPUs 🤯! Warp speed random forest with GPUs and RAPIDS!
Random Forest Theory The Random Forest with only one tree will overfit to data as well because it is the same as a single decision tree. When we add trees to the Random Forest then the tendency to overfitting should decrease (thanks to bagging and random feature selection).
For testing, 10 is enough but to achieve robust results, you can increase it up to 100 or 500. This however only makes sense if you have more than 8 input rasters, otherwise the training data is always the same, even if you repeat it 1000 times.
Each decision (non-leaf) node stores the left and right branch integer indices (2 x 8 bytes), the index of the feature used to split (8 bytes), the float value of the threshold for the decision feature (8 bytes), the decrease in impurity (8 bytes). Furthermore leaf nodes store the constant target value predicted by the leaf.
You can have a look at the Cython class definition in the source code for the details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With