Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert data from data.table to matrix efficiently (speed and memory)

I have a ~20,000x20,000 data, how do i convert the from data.table() to a matrix efficiently in terms of speed and memory?

I tried m = as.matrix(dt) but it takes very long with many warnings. df = data.frame(dt) takes very long and result in reaching memory limits as well.

Is there any efficient way to do this? Or, simply a function in data.table which returns dt as as matrix form(as required to feed into a statistical model using the glmnet package)?

Simply wrapping into as.matrix gives me below error:

x = as.matrix(dt)

Error: cannot allocate vector of size 2.9 Gb
In addition: Warning messages:
  1: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)
  2: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)
  3: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)
  4: In unlist(X, recursive = FALSE, use.names = FALSE) : Reached total allocation of 8131Mb: see help(memory.size)

My OS: I have 64 bit Windows7 and 8gb ram, my Windows task manager shows Rgui.exe taking up spaces more than 4gb before and were still fine though.

like image 502
Gibson Gay Avatar asked Oct 02 '12 14:10

Gibson Gay


People also ask

How do you convert data into a matrix?

Convert a Data Frame into a Numeric Matrix in R Programming – data. matrix() Function. data. matrix() function in R Language is used to create a matrix by converting all the values of a Data Frame into numeric mode and then binding them as a matrix.

Does data table use less memory?

Memory Usage (Efficiency) data. table is the most efficient when filtering rows. dplyr is far more efficient when summarizing by group while data. table was the least efficient.

Which function will convert the data to the matrix format?

You can do so with the "as. matrix" function. e.g.

How do I convert a data table to a matrix in R?

To convert a table into matrix in R, we can use apply function with as. matrix. noquote function.


2 Answers

Try:

    result <- as.matrix(tidytext::cast_sparse(dat_table,
    column_name_of_rows,
    column_name_of_columns,
    column_name_of_values))

It should be very efficient and fast.

like image 150
P. Denelle Avatar answered Oct 14 '22 07:10

P. Denelle


@GibsonGay:

I have made an error on my part to include the character column into the matrix, which elevated the matrix's class to character for all columns. Removing this column allowed a integer matrix to be made and it converted successfully without errors/warnings and ran the model fine.

like image 37
zx8754 Avatar answered Oct 14 '22 08:10

zx8754