Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R convert matrix or data frame to sparseMatrix

I have a regular matrix (non-sparse) that I would like to convert to a sparseMatrix (using the Matrix package). Is there a function to do this or do I need to do a bunch of loops?

ex.

> regMat <- matrix(0, nrow=10, ncol=10) > regMat[3,5] <- round(runif(1),2)*100 > regMat[2,8] <- round(runif(1),2)*100 > regMat[8,4] <- round(runif(1),2)*100 > regMat[1,6] <- round(runif(1),2)*100 > regMat[7,4] <- round(runif(1),2)*100 > regMat        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]  [1,]    0    0    0    0    0   49    0    0    0     0  [2,]    0    0    0    0    0    0    0   93    0     0  [3,]    0    0    0    0   20    0    0    0    0     0  [4,]    0    0    0    0    0    0    0    0    0     0  [5,]    0    0    0    0    0    0    0    0    0     0  [6,]    0    0    0    0    0    0    0    0    0     0  [7,]    0    0    0    8    0    0    0    0    0     0  [8,]    0    0    0   14    0    0    0    0    0     0  [9,]    0    0    0    0    0    0    0    0    0     0 [10,]    0    0    0    0    0    0    0    0    0     0 

Any suggestions?

like image 337
screechOwl Avatar asked May 11 '12 16:05

screechOwl


People also ask

Which of the following is used to convert dataset into sparseMatrix?

Converting a dataframe to sparse matrix We know that a dataframe is a table or 2-D array-like structure that has both rows and columns and is the most common way of storing data. We will convert the dataframe to a sparse matrix by using the sparseMatrix() function in R.

Can we convert matrix to Dataframe in R?

A matrix can be converted to a dataframe by using a function called as. data. frame(). It will take each column from the matrix and convert it to each column in the dataframe.

How do I convert a matrix to an array in R?

Functions Used data-is the input vector which becomes the data elements of the matrix. nrow-is the numbers of rows to be created. ncol-is the numbers of columns to be created. byrow-is a logical clue,if it is true then input vector elements are arranged by row.


2 Answers

Here are two options:

library(Matrix)  A <- as(regMat, "sparseMatrix")       # see also `vignette("Intro2Matrix")` B <- Matrix(regMat, sparse = TRUE)    # Thanks to Aaron for pointing this out  identical(A, B) # [1] TRUE A # 10 x 10 sparse Matrix of class "dgCMatrix" #                               #  [1,] . . .  .  . 45 .  . . . #  [2,] . . .  .  .  . . 59 . . #  [3,] . . .  . 95  . .  . . . #  [4,] . . .  .  .  . .  . . . #  [5,] . . .  .  .  . .  . . . #  [6,] . . .  .  .  . .  . . . #  [7,] . . . 23  .  . .  . . . #  [8,] . . . 63  .  . .  . . . #  [9,] . . .  .  .  . .  . . . # [10,] . . .  .  .  . .  . . . 
like image 138
Josh O'Brien Avatar answered Sep 23 '22 18:09

Josh O'Brien


Josh's answer is fine, but here are more options and explanation.

Nit Picky "I have a regular matrix (non-sparse)..." Actually you do have a sparse matrix (matrix with mostly 0s); it's just in uncompressed format. Your goal is to put it in a compressed storage format.

Sparse matrices can be compressed into multiple storage formats. Compressed Sparse Column (CSC) and Compressed Sparse Row (CSR) are the two dominant formats. as(regMat, "sparseMatrix") converts your matrix to type dgCMatrix which is compressed sparse column. This is usually what you want, but I prefer to be explicit about it.

library(Matrix)  matCSC <- as(regMat, "dgCMatrix")  # compressed sparse column CSC matCSC 10 x 10 sparse Matrix of class "dgCMatrix"   [1,] . . .  .  . 57 .  . . .  [2,] . . .  .  .  . . 27 . .  [3,] . . .  . 90  . .  . . .  [4,] . . .  .  .  . .  . . .  [5,] . . .  .  .  . .  . . .  [6,] . . .  .  .  . .  . . .  [7,] . . . 91  .  . .  . . .  [8,] . . . 37  .  . .  . . .  [9,] . . .  .  .  . .  . . . [10,] . . .  .  .  . .  . . .  matCSR <- as(regMat, "dgRMatrix")  # compressed sparse row CSR matCSR 10 x 10 sparse Matrix of class "dgRMatrix"   [1,] . . .  .  . 57 .  . . .  [2,] . . .  .  .  . . 27 . .  [3,] . . .  . 90  . .  . . .  [4,] . . .  .  .  . .  . . .  [5,] . . .  .  .  . .  . . .  [6,] . . .  .  .  . .  . . .  [7,] . . . 91  .  . .  . . .  [8,] . . . 37  .  . .  . . .  [9,] . . .  .  .  . .  . . . [10,] . . .  .  .  . .  . . . 

While these look and behave the same on the surface, internally they store data differently. CSC is faster for retrieving columns of data while CSR is faster for retrieving rows. They also take up different amounts of space depending on the structure of your data.

Furthermore, in this example you're converting an uncompressed sparse matrix to a compressed one. Usually you do this to save memory, so building an uncompressed matrix just to convert it to compressed form defeats the purpose. In practice it's more common to construct a compressed sparse matrix from a table of (row, column, value) triplets. You can do this with Matrix's sparseMatrix() function.

# Make data.frame of (row, column, value) triplets df <- data.frame(   rowIdx = c(3,2,8,1,7),   colIdx = c(5,8,4,6,4),   val = round(runif(n = 5), 2) * 100 )  df   rowIdx colIdx val 1      3      5  90 2      2      8  27 3      8      4  37 4      1      6  57 5      7      4  91  # Build CSC matrix matSparse <- sparseMatrix(   i = df$rowIdx,   j = df$colIdx,    x = df$val,    dims = c(10, 10) )  matSparse 10 x 10 sparse Matrix of class "dgCMatrix"   [1,] . . .  .  . 57 .  . . .  [2,] . . .  .  .  . . 27 . .  [3,] . . .  . 90  . .  . . .  [4,] . . .  .  .  . .  . . .  [5,] . . .  .  .  . .  . . .  [6,] . . .  .  .  . .  . . .  [7,] . . . 91  .  . .  . . .  [8,] . . . 37  .  . .  . . .  [9,] . . .  .  .  . .  . . . [10,] . . .  .  .  . .  . . . 

Shameless Plug - I have blog article covering this stuff and more if you're interested.

like image 37
Ben Avatar answered Sep 19 '22 18:09

Ben