Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge big sparse matrices

I have a large list of 25 sparse matrices (they are really big - 100M or more elements in one of them) and I need to merge them into one big sparse matrix.

For example: one matrix A may look like this (it's submatrix of my real matrix of 100M elements):

> A
5 x 4 sparse Matrix of class "dgCMatrix"
              SKU
CustomerID         404     457     547     558     
  100002_24655       1       .       .       .       
  100003_46919       .       1       1       .       
  100007_46702       .       .       .       .       
  100012_47709       .       .       .       .       
  100013_46132       1       1       1       1 

> dput(A)
new("dgCMatrix"
    , i = c(0L, 4L, 1L, 4L, 1L, 4L, 4L)
    , p = c(0L, 2L, 4L, 6L, 7L)
    , Dim = c(5L, 4L)
    , Dimnames = structure(list(CustomerID = c("100002_24655", "100003_46919", 
"100007_46702", "100012_47709", "100013_46132"), SKU = c("404", 
"457", "547", "558")), .Names = c("CustomerID", "SKU"
))
    , x = c(1, 1, 1, 1, 1, 1, 1)
    , factors = list()
)

the other B may look like this:

> B
7 x 5 sparse Matrix of class "dgCMatrix"
               SKU
CustomerID          191     404     558     715     787        
  100002_24655        .       .       .       .       .              
  100007_46702        1       1       1       1       1              
  100012_47709        .       .       1       .       .              
  100013_46132        .       .       .       .       1              
  100014_46400        .       .       .       .       .             
  100014_605414       1       1       1       .       .              
  100014_631294       .       .       1       1       1              

> dput(B)
new("dgCMatrix"
    , i = c(1L, 5L, 1L, 5L, 1L, 2L, 5L, 6L, 1L, 6L, 1L, 3L, 6L)
    , p = c(0L, 2L, 4L, 8L, 10L, 13L)
    , Dim = c(7L, 5L)
    , Dimnames = structure(list(CustomerID = c("100002_24655", "100007_46702", 
"100012_47709", "100013_46132", "100014_46400", "100014_605414", 
"100014_631294"), SKU = c("191", "404", "558", "715", 
"787")), .Names = c("CustomerID", "SKU"))
    , x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
    , factors = list()
)

The output should look like this: (the first part is first matrix and the second is second matrix - I divided it by space for better view)

12 x 7 sparse Matrix of class "dgCMatrix"    
             404  457  547  558  191  715  787    
     [1, ]     1    .    .    .    .    .    .     
     [2, ]     .    1    1    .    .    .    .
     [3, ]     .    .    .    .    .    .    .
     [4, ]     .    .    .    .    .    .    .
     [5, ]     1    1    1    1    .    .    .

     [6, ]     .    .    .    .    .    .    .
     [7, ]     1    .    .    1    1    1    1
     [8, ]     .    .    .    1    .    .    .
     [9, ]     .    .    .    .    .    .    1
     [10,]     .    .    .    .    .    .    .
     [11,]     1    .    .    1    1    .    .
     [12,]     .    .    .    1    .    1    1

That means I want to merge by column names. So how could I merge all 25 sparse matrix?

like image 499
Martina Zapletalová Avatar asked Oct 18 '22 06:10

Martina Zapletalová


1 Answers

So I edit a little bit dww answear to avoid error which I mentioned in comment. But it is a little bit slow. But I have really big matrices.

> proc.time() - ptm
   user  system elapsed 
572.384 213.179 793.550

This is the edited code:

merge.sparse = function(M.list) {
  A = M.list[[1]]

  for (i in 2:length(M.list)){ #i indexes of matrices
    # finding what's missing
    misA = colnames(M.list[[i]])[!colnames(M.list[[i]]) %in% colnames(A)]
    misB = colnames(A)[!colnames(A) %in% colnames(M.list[[i]])]

    misAl = as.vector(numeric(length(misA)), "list")
    names(misAl) = misA
    misBl = as.vector(numeric(length(misB)), "list")
    names(misBl) = misB

    ## adding missing columns to initial matrices
    An = Reduce(cbind, c(A, misAl))
    if (length(misA) > 0)
       {
       lenA <- ncol(An)-length(misA)+1
       colnames(An)[lenA:ncol(An)] = names(misAl)
       }

    Bn = Reduce(cbind, c(M.list[[i]], misBl))
    if(length(misB) > 0)
       {
       lenB <- ncol(Bn)-length(misB)+1
       colnames(Bn)[lenB:ncol(Bn)] = names(misBl)
       }

    Bn <- Bn[,colnames(An)]

    # final bind
    A = rbind(An, Bn, use.names = T)
    print(c(length(M.list), i))
  } 
  A
}
like image 180
Martina Zapletalová Avatar answered Nov 01 '22 11:11

Martina Zapletalová