Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R efficient way to use values as indexes

Tags:

performance

r

I have 10M rows matrix with integer values

A row in this matrix can look as follows:

1 1 1 1 2

I need to transform the row above to the following vector:

4 1 0 0 0 0 0 0 0

Other example:

1 2 3 4 5

To:

1 1 1 1 1 0 0 0 0

How to do it efficiently in R ?

Update: There is a function that does exactly what I need: base::tabulate (suggested here before) but it is extremely slow (took at least 15 mins to go over my init matrix)

like image 898
YevgenyM Avatar asked Jan 18 '26 10:01

YevgenyM


1 Answers

I would try something like this:

m <- nrow(x)
n <- ncol(x)
i.idx <- seq_len(m)
j.idx <- seq_len(n)

out <- matrix(0L, m, max(x))

for (j in j.idx) {
   ij <- cbind(i.idx, x[, j])
   out[ij] <- out[ij] + 1L
} 

A for loop might sound surprising for a question that asks for an efficient implementation. However, this solution is vectorized for a given column and only loops through five columns. This will be many, many times faster than looping over 10 million rows using apply.

Testing with:

n <- 1e7
m <- 5
x <- matrix(sample(1:9, n*m, T), n ,m)

this approach takes less than six seconds while a naive t(apply(x, 1, tabulate, 9)) takes close to two minutes.

like image 78
flodel Avatar answered Jan 20 '26 01:01

flodel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!