Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index values from a matrix using row, col indices

I have a 2D matrix mat with 500 rows × 335 columns, and a data.frame dat with 120425 rows. The data.frame dat has two columns I and J, which are integers to index the row, column from mat. I would like to add the values from mat to the rows of dat.

Here is my conceptual fail:

> dat$matval <- mat[dat$I, dat$J] Error: cannot allocate vector of length 1617278737 

(I am using R 2.13.1 on Win32). Digging a bit deeper, I see that I'm misusing matrix indexing, as it appears that I'm only getting a sub-matrix of mat, and not a single-dimension array of values as I expected, i.e.:

> str(mat[dat$I[1:100], dat$J[1:100]])  int [1:100, 1:100] 20 1 1 1 20 1 1 1 1 1 ... 

I was expecting something like int [1:100] 20 1 1 1 20 1 1 1 1 1 .... What is the correct way to index a 2D matrix using indices of row, column to get the values?

like image 396
Mike T Avatar asked Aug 03 '11 00:08

Mike T


People also ask

How do you find the index of a matrix?

To find the position of an element in an array, you use the indexOf() method. This method returns the index of the first occurrence the element that you want to find, or -1 if the element is not found. The following illustrates the syntax of the indexOf() method.

How do you index a column and row in Matlab?

To access elements in a range of rows or columns, use the colon . For example, access the elements in the first through third row and the second through fourth column of A . An alternative way to compute r is to use the keyword end to specify the second column through the last column.


2 Answers

Almost. Needs to be offered to "[" as a two column matrix:

dat$matval <- mat[ cbind(dat$I, dat$J) ] # should do it. 

There is a caveat: Although this also works for dataframes, they are first coerced to matrix-class and if any are non-numeric, the entire matrix becomes the "lowest denominator" class.

like image 173
IRTFM Avatar answered Oct 07 '22 23:10

IRTFM


Using a matrix to index as DWin suggests is of course much cleaner, but for some strange reason doing it manually using 1-D indices is actually slightly faster:

# Huge sample data mat <- matrix(sin(1:1e7), ncol=1000) dat <- data.frame(I=sample.int(nrow(mat), 1e7, rep=T),                    J=sample.int(ncol(mat), 1e7, rep=T))  system.time( x <- mat[cbind(dat$I, dat$J)] )     # 0.51 seconds system.time( mat[dat$I + (dat$J-1L)*nrow(mat)] ) # 0.44 seconds 

The dat$I + (dat$J-1L)*nrow(m) part turns the 2-D indices into 1-D ones. The 1L is the way to specify an integer instead of a double value. This avoids some coercions.

...I also tried gsk3's apply-based solution. It's almost 500x slower though:

system.time( apply( dat, 1, function(x,mat) mat[ x[1], x[2] ], mat=mat ) ) # 212 
like image 40
Tommy Avatar answered Oct 07 '22 22:10

Tommy