Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding values to a matrix using index vectors that include row and column names

Suppose I have a really big matrix of sparse data, but i'm only interested in looking at a sample of it making it even more sparse. Suppose I also have a dataframe of triples including columns for row/column/value of the data (imported from a csv file). I know I can use the sparseMatrix() function of library(Matrix) to create a sparse matrix using

sparseMatrix(i=df$row,j=df$column,x=df$value)

However, because of my values I end up with a sparse matrix that's millions of rows by tens of thousands of columns (most of which are empty because my subset is excluding most of the rows and columns). All of those zero rows and columns end up skewing some of my functions (take clustering for example -- I end up with one cluster that includes the origin when the origin isn't even a valid point). I'd like to perform the same operation, but using i and j as rownames and colnames. I've tried creating a dense vector, sampling down to the max size and adding values using

denseMatrix <- matrix(0,nrows,ncols,dimnames=c(df$row,df$column))
denseMatrix[as.character(df$row),as.character(df$column)]=df$value

(actually I've been setting it equal to 1 because I'm not interested in the value in this case) but I've been finding it fills in the entire matrix because it takes the cross of all the rows and columns rather than just row1*col1, row2*col2... Does anybody know a way to accomplish what I'm trying to do? Alternatively i'd be fine with filling in a sparse matrix and simply having it somehow discard all of the zero rows and columns to compact itself into a denser form (but I'd like to maintain some reference back to the original row and column numbers) I appreciate any suggestions!

Here's an example:

> rows<-c(3,1,3,5)
> cols<-c(2,4,6,6)
> mtx<-sparseMatrix(i=rows,j=cols,x=1)
> mtx
5 x 6 sparse Matrix of class "dgCMatrix"

[1,] . . . 1 . .
[2,] . . . . . .
[3,] . 1 . . . 1
[4,] . . . . . .
[5,] . . . . . 1

I'd like to get rid of colums 1,3 and 5 as well as rows 2 and 4. This is a pretty trivial example, but imagine if instead of having row numbers 1, 3 and 5 they were 1000, 3000 and 5000. Then there would be a lot more empty rows between them. Here's what happens when I using a dense matrix with named rows/columns

> dmtx<-matrix(0,3,3,dimnames=list(c(1,3,5),c(2,4,6)))
> dmtx
  2 4 6
1 0 0 0
3 0 0 0
5 0 0 0
> dmtx[as.character(rows),as.character(cols)]=1
> dmtx
  2 4 6
1 1 1 1
3 1 1 1
5 1 1 1
like image 712
dscheffy Avatar asked Aug 23 '11 16:08

dscheffy


People also ask

How do you represent rows and columns in a matrix?

The arrangement of elements in a single column represents the column matrix. A matrix is denoted by [aij]mxn, where i and j represent the position of elements in the matrix, row-wise and column-wise, m is the number of rows and n is the number of columns.

How does matrix indexing work?

In logical indexing, you use a single, logical array for the matrix subscript. MATLAB extracts the matrix elements corresponding to the nonzero values of the logical array. The output is always in the form of a column vector. For example, A(A > 12) extracts all the elements of A that are greater than 12.

How do I use Colnames in R?

Method 1: using colnames() methodcolnames() method in R is used to rename and replace the column names of the data frame in R. The columns of the data frame can be renamed by specifying the new column names as a vector. The new name replaces the corresponding old name of the column in the data frame.

What is row index in matrix?

Description. Returns a matrix of integers indicating their row number in a matrix-like object, or a factor indicating the row labels.


1 Answers

When you say "get rid of" certain columns/rows, do you mean just this:

> mtx[-c(2,4), -c(1,3,5)]
3 x 3 sparse Matrix of class "dgCMatrix"

[1,] . 1 .
[2,] 1 . 1
[3,] . . 1

Subsetting works, so you just need a way of finding out which rows and columns are empty? If that is correct, then you can use colSums() and rowSums() as these have been enhanced by the Matrix package to have appropriate methods for sparse matrices. This should preserve the sparseness during the operation

> dimnames(mtx) <- list(letters[1:5], LETTERS[1:6])
> mtx[which(rowSums(mtx) != 0), which(colSums(mtx) != 0)]
3 x 3 sparse Matrix of class "dgCMatrix"
  B D F
a . 1 .
c 1 . 1
e . . 1

or, perhaps safer

> mtx[rowSums(mtx) != 0, colSums(mtx) != 0]
3 x 3 sparse Matrix of class "dgCMatrix"
  B D F
a . 1 .
c 1 . 1
e . . 1
like image 184
Gavin Simpson Avatar answered Sep 23 '22 01:09

Gavin Simpson