One of the new features of R 3.0.0 was the introduction of long vectors. However, .C() and .Fortran() do not accept long vector inputs. On R-bloggers I find:
This is a precaution as it is very unlikely that existing code will have been written to handle long vectors (and the R wrappers often assume that length(x) is an integer)
I work with R-package randomForest and this package obviously needs .Fortran() since it crashes leaving the error message
Error in randomForest.default: long vectors (argument 20) are not supported in .Fortran
How to overcome this problem? I use randomForest 4.6-7 (built under R 3.0.2) on a Windows 7 64bit computer.
The only way to guarantee that your input data frame will be accepted by randomForest is to ensure that the vectors inside the data frame do not have length which exceeds 2^31 – 1 (i.e are not long). If you must start off with a data frame containing long vectors, then you would have the subset the data frame to achieve an acceptable dimension for the vectors. Here is one way you could subset a data frame to make it suitable for randomForest:
# given data frame 'df' with long vectors
maxDim <- 2^31 - 1;
df[1:maxDim, ]
However, there is a major problem with doing this which is that you would be throwing away all observations (i.e. features) appearing in rows 2^31 or higher. In practice, you probably do not need so many observations to run a random forest calculation. The easy workaround to your problem is to simply take a statistically valid sub sample of the original dataset with a size which does not exceed 2^31 - 1. Store the data using R vectors not of the long type, and your randomForest calculation should run without any issues.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With