I have a 2D matrix <code>mat</code> with 500 rows × 335 columns, and a data.frame <code>dat</code> with 120425 rows. The data.frame <code>dat</code> has two columns <code>I</code> and <code>J</code>, which are integers to index the row, column from <code>mat</code>. I would like to add the values from <code>mat</code> to the rows of <code>dat</code>. Here is my conceptual fail: <pre class="prettyprint"><code>> dat$matval <- mat[dat$I, dat$J] Error: cannot allocate vector of length 1617278737 </code></pre> (I am using R 2.13.1 on Win32). Digging a bit deeper, I see that I'm misusing matrix indexing, as it appears that I'm only getting a sub-matrix of <code>mat</code>, and not a single-dimension array of values as I expected, i.e.: <pre class="prettyprint"><code>> str(mat[dat$I[1:100], dat$J[1:100]]) int [1:100, 1:100] 20 1 1 1 20 1 1 1 1 1 ... </code></pre> I was expecting something like <code>int [1:100] 20 1 1 1 20 1 1 1 1 1 ...</code>. What is the correct way to index a 2D matrix using indices of row, column to get the values?

Almost. Needs to be offered to "[" as a two column matrix: <pre class="prettyprint"><code>dat$matval <- mat[ cbind(dat$I, dat$J) ] # should do it. </code></pre> There is a caveat: Although this also works for dataframes, they are first coerced to matrix-class and if any are non-numeric, the entire matrix becomes the "lowest denominator" class.

Index values from a matrix using row, col indices

Tags:

indexing

r

r-faq

matrix

I have a 2D matrix mat with 500 rows × 335 columns, and a data.frame dat with 120425 rows. The data.frame dat has two columns I and J, which are integers to index the row, column from mat. I would like to add the values from mat to the rows of dat.

Here is my conceptual fail:

> dat$matval <- mat[dat$I, dat$J] Error: cannot allocate vector of length 1617278737

(I am using R 2.13.1 on Win32). Digging a bit deeper, I see that I'm misusing matrix indexing, as it appears that I'm only getting a sub-matrix of mat, and not a single-dimension array of values as I expected, i.e.:

> str(mat[dat$I[1:100], dat$J[1:100]])  int [1:100, 1:100] 20 1 1 1 20 1 1 1 1 1 ...

I was expecting something like int [1:100] 20 1 1 1 20 1 1 1 1 1 .... What is the correct way to index a 2D matrix using indices of row, column to get the values?

396

asked Aug 03 '11 00:08

Mike T

2 Answers

Almost. Needs to be offered to "[" as a two column matrix:

dat$matval <- mat[ cbind(dat$I, dat$J) ] # should do it.

There is a caveat: Although this also works for dataframes, they are first coerced to matrix-class and if any are non-numeric, the entire matrix becomes the "lowest denominator" class.

173

answered Oct 07 '22 23:10

IRTFM

Using a matrix to index as DWin suggests is of course much cleaner, but for some strange reason doing it manually using 1-D indices is actually slightly faster:

# Huge sample data mat <- matrix(sin(1:1e7), ncol=1000) dat <- data.frame(I=sample.int(nrow(mat), 1e7, rep=T),                    J=sample.int(ncol(mat), 1e7, rep=T))  system.time( x <- mat[cbind(dat$I, dat$J)] )     # 0.51 seconds system.time( mat[dat$I + (dat$J-1L)*nrow(mat)] ) # 0.44 seconds

The dat$I + (dat$J-1L)*nrow(m) part turns the 2-D indices into 1-D ones. The 1L is the way to specify an integer instead of a double value. This avoids some coercions.

...I also tried gsk3's apply-based solution. It's almost 500x slower though:

system.time( apply( dat, 1, function(x,mat) mat[ x[1], x[2] ], mat=mat ) ) # 212

answered Oct 07 '22 22:10

Tommy

Related questions
                            
                                How can I generate a GUID in R?
                            
                                Reduce size of legend area in barplot
                            
                                R and Leaflet: How to arrange label text across multiple lines
                            
                                Ensuring reproducibility in an R environment
                            
                                Suppress ticks in plot in r
                            
                                Speed up plot() function for large dataset
                            
                                How to get reverse of a TRUE/FALSE vector?
                            
                                R glmnet as.matrix() error message
                            
                                writing a matrix to a file, without a header and row numbers
                            
                                calculating time difference in R
                            
                                How to use Dplyr's Summarize and which() to lookup min/max values
                            
                                How to upgrade R in linux?
                            
                                Standard Deviation in R Seems to be Returning the Wrong Answer - Am I Doing Something Wrong?
                            
                                How to not display number as exponent?
                            
                                Extract a row from a table object
                            
                                risks of using setwd() in a script?
                            
                                When does 'quietly = TRUE' actually work in the require() function?
                            
                                Why does a boxplot in ggplot requires axis x and y?
                            
                                wrap code in R Studio text editor
                            
                                Align text inside a plot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With