Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fast melt large 2d matrix to 3 column data.table

Tags:

r

data.table

I have a large matrix num [1:62410, 1:48010]

I want this in a long format data.table

e.g.

   Var1 Var2     value
1:    1    1 -4227.786
2:    2    1 -4211.908
3:    3    1 -4197.034
4:    4    1 -4183.645
5:    5    1 -4171.692
6:    6    1 -4161.634

minimal example

m = matrix(1:5, nrow = 1000, ncol = 1000)
x = data.table(reshape2::melt(m))

ideally I'd want the columns names x, y and value at the same time.

Previously I've been using data.table(melt(mymatrix)). But judging by the warnings that reshape2::melt is deprecated, this is probably not optimal in terms of speed, what would be the best "data.table" way of solving this?

the following do not answer my question: Fast melted data.table operations Proper/fastest way to reshape a data.table

Other answers refer to the deprecated reshape2 package

like image 705
BetaScoo8 Avatar asked Dec 31 '22 01:12

BetaScoo8


2 Answers

Here's an example:

# example matrix
m = matrix(1:12, nrow = 4)

# load data table
library(data.table)

We can extract the data, row and column info directly and it should be pretty fast:

dt = data.table(
  row = rep(seq_len(nrow(m)), ncol(m)), 
  col = rep(seq_len(ncol(m)), each = nrow(m)), 
  value = c(m)
)

The result is:

    row col value
 1:   1   1     1
 2:   2   1     2
 3:   3   1     3
 4:   4   1     4
 5:   1   2     5
 6:   2   2     6
 7:   3   2     7
 8:   4   2     8
 9:   1   3     9
10:   2   3    10
11:   3   3    11
12:   4   3    12
like image 129
talat Avatar answered Jan 06 '23 20:01

talat


There is a as.data.table method for array which will do the trick for you.

dim(m) <- c(dim(m), 1L)
as.data.table(m)

In future, when posting questions on SO, please provide minimal example.

I now looked at the source of it and I see it may not be very efficient, because it materializes all NA values, and then removes them.

like image 41
jangorecki Avatar answered Jan 06 '23 20:01

jangorecki