I have a large dataset in R (1M+ rows by 6 columns) that I want to use to train a random forest (using the randomForest
package) for regression purposes. Unfortunately, I get a Error in matrix(0, n, n) : too many elements specified
error when trying to do the whole thing at once and cannot allocate enough memory kind of errors when running in on a subset of the data -- down to 10,000 or so observations.
Seeing that there is no chance I can add more RAM on my machine and random forests are very suitable for the type of process I am trying to model, I'd really like to make this work.
Any suggestions or workaround ideas are much appreciated.
They suggest that a random forest should have a number of trees between 64 - 128 trees. With that, you should have a good balance between ROC AUC and processing time.
Random forests is great with high dimensional data since we are working with subsets of data. It is faster to train than decision trees because we are working only on a subset of features in this model, so we can easily work with hundreds of features.
Conclusion: In small datasets from two-phase sampling design, variable screening and inverse sampling probability weighting are important for achieving good prediction performance of random forests. In addition, stacking random forests and simple linear models can offer improvements over random forests.
Both the Random Forest and Neural Networks are different techniques that learn differently but can be used in similar domains. Random Forest is a technique of Machine Learning while Neural Networks are exclusive to Deep Learning.
You're likely asking randomForest
to create the proximity matrix for the data, which if you think about it, will be insanely big: 1 million x 1 million. A matrix this size would be required no matter how small you set sampsize
. Indeed, simply Googling the error message seems to confirm this, as the package author states that the only place in the entire source code where n,n)
is found is in calculating the proximity matrix.
But it's hard to help more, given that you've provided no details about the actual code you're using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With