Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

maximum size of a matrix in R

Tags:

memory

r

I am using igraph to do some network analysis. As part of that, I have to create a matrix with 2 columns and as many rows as there are links. I have a large network (several million links) and creating this matrix didn't work after 3 hours of run time (no errors, just no result, and it shows "not responding").

What is the maximum size of such a character matrix? How long does it take to run?

I am running 64 bit R 2.14.1, on a Windows 7 machine with 4 GB of memory running at 2.67 Ghz

thanks

ADDED Thanks for the quick responses. This made me positive it wasn't the size of the matrix; it turned out to be an error in which columns of another matrix I was using to create that matrix.

like image 374
Peter Flom Avatar asked Apr 02 '12 21:04

Peter Flom


2 Answers

The theoretical limit of a vector in R is 2147483647 elements. So that's about 1 billion rows / 2 columns.

...but that amount of data does not fit in 4 GB of memory... And especially not with strings in a character vector. Each string is at least 96 bytes (object.size('a') == 96), and each element in your matrix will be a pointer (8 bytes) to such a string (there is only one instance of each unique string though).

So what typically happens is that the machine starts using virtual memory and start swapping. Heavy swapping typically kills all hope of ever finishing in this century - especially on Windows.

But if you are using a package (igraph?) and you're asking it to produce the matrix, it probably does a lot of internal work and creates lots of auxiliary objects. So even if you're nowhere near the memory limit for the single result matrix, the algorithm used to produce it can run out of memory. It can also be non-linear (quadratic or worse) in time, which would again kill all hope of ever finishing in this century...

A good way to investigate could be to time it on a small graph (e.g. using system.time), and the again when doubling the graph size a couple of times. Then you can see if the time is linear or quadratic and you can estimate how long it will take to complete your big graph. If the prediction says a week, well then you know ;-)

like image 125
Tommy Avatar answered Oct 14 '22 00:10

Tommy


R matrices can be addressed in single index notation as they are really a vector with a dim attribute of length 2 and in R vectors are addressed by a signed 32-bit integer even if you are using the 64-bit version. So a 2-column matrix can have a maximum of 2^30-1 rows.

A data.frame would allow you to use 2^31-1 rows and columns.

like image 41
James Avatar answered Oct 14 '22 02:10

James