Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r - Binding sparse matrices of different sizes on rows

I am attempting to use the Matrix package to bind two sparse matrices of different size together. The binding is on rows, using the column names for matching.

Table A:

ID     | AAAA   | BBBB   |
------ | ------ | ------ |
XXXX   | 1      | 2      |

Table B:

ID     | BBBB   | CCCC   |
------ | ------ | ------ |
YYYY   | 3      | 4      |

Binding table A and B:

ID     | AAAA   | BBBB   | CCCC   |
------ | ------ | ------ | ------ |
XXXX   | 1      | 2      |        |
YYYY   |        | 3      | 4      |

The intention is to insert a large number of small matrices into a single large matrix, to enable continuous querying and update/inserts.

I find that neither the Matrix or slam packages have functionality to handle this.

Similar questions have been asked in the past, but it seems no solution has been found:

Post 1: in-r-when-using-named-rows-can-a-sparse-matrix-column-be-added-concatenated

Post 2: bind-together-sparse-model-matrices-by-row-names

Ideas on how to solve it will be highly appreciated.

Best regards,

Frederik

like image 778
Frederik Andersen Avatar asked Dec 14 '22 23:12

Frederik Andersen


1 Answers

For my purposes (very sparse matrix with millions of rows, and tens of thousands of columns, more than 99.9% of the values empty) this was still much too slow. What worked was the code below - might be helpful to others as well:

merge.sparse = function(listMatrixes) {
  # takes a list of sparse matrixes with different columns and adds them row wise

  allColnames <- sort(unique(unlist(lapply(listMatrixes,colnames))))
  for (currentMatrix in listMatrixes) {
    newColLocations <- match(colnames(currentMatrix),allColnames)
    indexes <- which(currentMatrix>0, arr.ind = T)
    newColumns <- newColLocations[indexes[,2]]
    rows <- indexes[,1]
    newMatrix <- sparseMatrix(i=rows,j=newColumns, x=currentMatrix@x,
                              dims=c(max(rows),length(allColnames)))
    if (!exists("matrixToReturn")) {
      matrixToReturn <- newMatrix
    }
    else {
      matrixToReturn <- rbind2(matrixToReturn,newMatrix)
    }
  }
  colnames(matrixToReturn) <- allColnames
  matrixToReturn  
}
like image 68
Valentin Avatar answered Jan 06 '23 12:01

Valentin