We are trying to experiment using the gbm package on a quite large dataset (~140 million rows) and we have ran into a problem with the memory requirements of R.
We have tried combining the packages 'gbm' and 'bigmemory' with no success and our next thought was to modify the C++ source code to draw data from a local database where we have stored our dataset.
So, we were wondering if there is a more appropriate or well-known practice in order to change the allocation inside the C++ code of gbm. Has anyone tried something similar?
I’m not familiar with the gbm package, but if it works on data frames or vectors of some kind you could use the ff package.
Quote: The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With