Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a large data frame in R with or without creating a matrix first and then converting it to a data.frame?

I need to create a matrix which has 80000 rows and 80000 columns. But, after reading on Rbloggers, I got to know that, the number of elements in a matrix cannot exceed 2^31 - 1. My plan to avoid this problem for my particular algorithm is to use a data frame instead of a matrix. Is there a way I can create an empty data frame of dimension 80000 x 80000 without first creating a matrix and then converting it to a data.frame using as.data.frame like below?

myMatrix <- matrix(0, ncol = 40, nrow = 90)
myDataFrame <- as.data.frame(myMatrix)
like image 296
DataBasterd Avatar asked Jun 06 '15 00:06

DataBasterd


1 Answers

You could construct an empty data frame of size 80,000 x 80,000 as follows:

dat <- do.call(data.frame, replicate(80000, rep(FALSE, 80000), simplify=FALSE))
dim(dat)
# [1] 80000 80000
dat[1,1]
# [1] FALSE
dat[80000,80000]
# [1] FALSE

Basically you build a list containing each column of the data frame you want to build (I built the list with replicate with simplify=FALSE) and then you build a data frame out of this with do.call and the data.frame function.

A few notes:

  1. You'd better have several dozen gigabytes of memory to have a chance of fitting this into your computer's memory (my R process shows 48 GB of allocated memory).
  2. This will be much slower than matrix allocation; for the 8000 x 8000 case the data frame construction took 36 seconds and the matrix construction took 1 second. It took 54 minutes for the data full data frame to allocate.
  3. If your data is sparse, this is a wasteful option and you should use a sparse matrix.

Though allocating a matrix of this size did not fail at allocation in 64-bit linux (R version 3.2.0), basic operations don't appear to work:

x <- matrix(0, nrow=80000, ncol=80000)
dim(x)
# [1] 80000 80000
x[1,1]
# Error: long vectors not supported yet: subset.c:733
like image 94
josliber Avatar answered Oct 12 '22 00:10

josliber