Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a `data.table` representation of sparse matrices / objects?

I want to work in the data.table framework for various reasons not contained in this post. Does data.table have a sparse representation for indicator matrices, ala the Matrix package?

library(Matrix)
library(data.table)

set.seed(123409L)

ints <- sample.int(2L, 1e6, replace=T, prob= c(0.9, 0.1)) - 1

m <- Matrix(ints, ncol= 1000)
dt <- data.table(matrix(ints, ncol= 1000))

pryr::object_size(m) # 1.22 MB
pryr::object_size(dt) # 8.1 MB

Assume in the actual use case I have closer to 6e8 elements, where growth is hypothetically unbounded.

Apologies in advance if this question has already been answered. I'm happy for it to be flagged as a duplicate; but I didn't find a duplicate via search.

like image 572
Alex W Avatar asked Oct 29 '22 09:10

Alex W


1 Answers

As @Frank suggested in his comment, you can represent a sparse matrix efficiently in a data.table by storing non-zero elements along with their indices as individual observations, i.e., in triplet form:

m2 <- as(m, "dgTMatrix")
dt2 <- data.table(i=m2@i+1, j=m2@j+1, value=m2@x)

pryr::object_size(dt2) # 1.62 MB

this data.table can also be constructed from dt:

dt2 <- melt(copy(dt)[,i:=.I], id.vars="i"
    )[value>0][,j:=as.integer(variable)][,variable:=NULL]
like image 140
jan-glx Avatar answered Nov 09 '22 11:11

jan-glx