Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large Matrices in R: long vectors not supported yet

Tags:

r

vector

matrix

I am running 64 bit R 3.1 in a 64bit Ubuntu environment with 400GB of RAM, and I am encountering a strange limitation when dealing with large matrices.

I have a numeric matrix called A, that is 4000 rows by 950,000 columns. When I try to access any element in it, I receive the following error:

Error: long vectors not supported yet: subset.c:733

Although my matrix was read in via scan, you can replicate with the following code

test <- matrix(1,4000,900000) #no error
test[1,1] #error

My Googling reveals this was a common error message prior to R 3.0, where a vector of size 2^31-1 was the limit. However, this is not the case, given my environment.

Should I not be using the native matrix type for this kind of matrix?

like image 304
The_Anomaly Avatar asked Jun 20 '14 21:06

The_Anomaly


2 Answers

A matrix is just an atomic vector with a dimension attribute which allows R to access it as a matrix. Your matrix is a vector of length 4000*9000000 which is 3.6e+10 elements (the largest integer value is approx 2.147e+9). Subsetting a long vector is supported for atomic vectors (i.e. accessing elements beyond the 2.147e+9 limit). Just treat your matrix as a long vector.

If we remember that by default R fills matrices column-wise then if we wanted to retrieve say the value at test[ 2701 , 850000 ] we could access it via:

i <- ( 2701 - 1 ) * 850000 + 2701 
test[i]
#[1] 1

Note that this really is long vector subsetting because:

2701L * 850000L
#[1] NA
#Warning message:
#In 2701L * 850000L : NAs produced by integer overflow
like image 108
Simon O'Hanlon Avatar answered Oct 04 '22 02:10

Simon O'Hanlon


An alternate, quick-hand solution would be to first get the row and then the column (now the i'th element of the resulting vector) of the matrix. For example ...

test <- matrix(1,4000,900000) #no error 
test[1,1] #error
test[1, ][1] # no error

Of course, this produces some overhead, as the whole row is copied/accessed first, but it's more straightforward to read. Also works for first extracting the column and then the row.

like image 40
Stingery Avatar answered Oct 04 '22 03:10

Stingery