I am running 64 bit R 3.1 in a 64bit Ubuntu environment with 400GB of RAM, and I am encountering a strange limitation when dealing with large matrices.
I have a numeric matrix called A, that is 4000 rows by 950,000 columns. When I try to access any element in it, I receive the following error:
Error: long vectors not supported yet: subset.c:733
Although my matrix was read in via scan
, you can replicate with the following code
test <- matrix(1,4000,900000) #no error
test[1,1] #error
My Googling reveals this was a common error message prior to R 3.0, where a vector of size 2^31-1 was the limit. However, this is not the case, given my environment.
Should I not be using the native matrix type for this kind of matrix?
A matrix is just an atomic vector with a dimension attribute which allows R to access it as a matrix. Your matrix is a vector of length 4000*9000000
which is 3.6e+10
elements (the largest integer value is approx 2.147e+9
). Subsetting a long vector is supported for atomic vectors (i.e. accessing elements beyond the 2.147e+9
limit). Just treat your matrix as a long vector.
If we remember that by default R fills matrices column-wise then if we wanted to retrieve say the value at test[ 2701 , 850000 ]
we could access it via:
i <- ( 2701 - 1 ) * 850000 + 2701
test[i]
#[1] 1
Note that this really is long vector subsetting because:
2701L * 850000L
#[1] NA
#Warning message:
#In 2701L * 850000L : NAs produced by integer overflow
An alternate, quick-hand solution would be to first get the row and then the column (now the i'th element of the resulting vector) of the matrix. For example ...
test <- matrix(1,4000,900000) #no error
test[1,1] #error
test[1, ][1] # no error
Of course, this produces some overhead, as the whole row is copied/accessed first, but it's more straightforward to read. Also works for first extracting the column and then the row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With