I want to work in the data.table framework for various reasons not contained in this post. Does data.table have a sparse representation for indicator matrices, ala the Matrix package?
library(Matrix)
library(data.table)
set.seed(123409L)
ints <- sample.int(2L, 1e6, replace=T, prob= c(0.9, 0.1)) - 1
m <- Matrix(ints, ncol= 1000)
dt <- data.table(matrix(ints, ncol= 1000))
pryr::object_size(m) # 1.22 MB
pryr::object_size(dt) # 8.1 MB
Assume in the actual use case I have closer to 6e8 elements, where growth is hypothetically unbounded.
Apologies in advance if this question has already been answered. I'm happy for it to be flagged as a duplicate; but I didn't find a duplicate via search.
As @Frank suggested in his comment, you can represent a sparse matrix efficiently in a data.table by storing non-zero elements along with their indices as individual observations, i.e., in triplet form:
m2 <- as(m, "dgTMatrix")
dt2 <- data.table(i=m2@i+1, j=m2@j+1, value=m2@x)
pryr::object_size(dt2) # 1.62 MB
this data.table can also be constructed from dt: 
dt2 <- melt(copy(dt)[,i:=.I], id.vars="i"
    )[value>0][,j:=as.integer(variable)][,variable:=NULL]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With