Minimum number of observation when performing Random Forest

Tags:

Is it possible to apply RandomForests to very small datasets? I have a dataset with many variables but only 25 observation each. Random forests produce reasonable results with low OOB errors (10-25%). Is there any rule of thumb regarding the minimum number of observations to use? In fact one of the response variable is unbalanced, and if I'm going to subsample it I will end up with an even smaller number of observations. Thanks in advance

324

asked Jul 09 '13 09:07

Oritteropus

1 Answers

Absolutely RF can be used on these type of datasets (i.e. p>n). In fact they use RF in fields like genomics where the number of fields >= 20000 and there are only a very small number of rows - say 10-12. The entire problem is figuring out which of the 20k variables would make up a parsimonious marker (i.e. feature selection is the entire problem).

I don't have any ROTs about minimum size other than if your model doesn't work well on a held back sample (or Hold-One-Back cross validation might work well in your case) well then you should try something else.

Hope this helps

108

answered Oct 21 '22 13:10

Wake2Sleep

Related questions
                            
                                What machine learning algorithm is appropriate for predicting one time-series from another?
                            
                                Using multiple features with scikit-learn
                            
                                How to plot ROC curve with scikit learn for the multiclass case?
                            
                                Why there is the need of using regularization in machine learning problems?
                            
                                What is the issue in my calculation of Multivariate Kernel Estimation?
                            
                                keras model.fit_generator() several times slower than model.fit()
                            
                                Imputer reduces the size of columns in my dataframe
                            
                                How do I keep track of the time the CPU is used vs the GPUs for deep learning?
                            
                                Hot to fix Tensorflow model not running in Eager mode with .fit()?
                            
                                Why I am getting DatasetV1Adapter return type instead of TensorSliceDataset for tf.data.Dataset.from_tensor_slices(X)
                            
                                How to use the PyTorch Transformer with multi-dimensional sequence-to-seqence?
                            
                                How can I work with my own dataset in scikit-learn (for computer vision)?
                            
                                sklearn - cross validation with precision scoring for a subset of classes
                            
                                Keras + Tensorflow strange results
                            
                                TensorFlow estimator.predict() gives WARNING:tensorflow:Input graph does not contain a QueueRunner
                            
                                Validation loss when using Dropout
                            
                                How to predict integer values using ML.NET?
                            
                                Stuck in understanding the difference between update usels of TD(0) and TD(λ)
                            
                                Inconsistent predictions from predict.gbm()
                            
                                Recommendation engine without ratings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Minimum number of observation when performing Random Forest

Tags:

machine-learning

random-forest

sample-size

Oritteropus

People also ask

1 Answers

Wake2Sleep

Recent Activity

Donate For Us