I want to work in the data.table
framework for various reasons not contained in this post. Does data.table
have a sparse representation for indicator matrices, ala the Matrix
package?
library(Matrix)
library(data.table)
set.seed(123409L)
ints <- sample.int(2L, 1e6, replace=T, prob= c(0.9, 0.1)) - 1
m <- Matrix(ints, ncol= 1000)
dt <- data.table(matrix(ints, ncol= 1000))
pryr::object_size(m) # 1.22 MB
pryr::object_size(dt) # 8.1 MB
Assume in the actual use case I have closer to 6e8
elements, where growth is hypothetically unbounded.
Apologies in advance if this question has already been answered. I'm happy for it to be flagged as a duplicate; but I didn't find a duplicate via search.
As @Frank suggested in his comment, you can represent a sparse matrix efficiently in a data.table
by storing non-zero elements along with their indices as individual observations, i.e., in triplet form:
m2 <- as(m, "dgTMatrix")
dt2 <- data.table(i=m2@i+1, j=m2@j+1, value=m2@x)
pryr::object_size(dt2) # 1.62 MB
this data.table
can also be constructed from dt
:
dt2 <- melt(copy(dt)[,i:=.I], id.vars="i"
)[value>0][,j:=as.integer(variable)][,variable:=NULL]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With