Does anyone have suggestions of how I could vectorise this code or otherwise speed it up? I'm creating a matrix, potentially very large. In each row, I want to select 2 columns at random, and flip them from 0 to 1.
I cannot select the same row and column number, i.e. the diagonal of the matrix will be zero hence the (1:N)[-j] in sample(). Because this changes with each row, I can't see a way to do this by using vectorisation, but parallelisation could be an option here?
I use library(Matrix) for sparse matrix functionality.
library(Matrix)
N <- 100
m <- Matrix(0, nrow = N, ncol = N)
for(j in 1:N) {
cols <- sample((1:N)[-j], 2) #Choose 2 columns not equal to the
m[j, cols] <- 1
}
Any ideas?
library(Matrix)
N <- 7
desired_output <- Matrix(0, nrow = N, ncol = N)
set.seed(1)
for(j in 1:N) {
cols <- sample((1:N)[-j], 2) #Choose 2 columns not equal to the
desired_output[j, cols] <- 1
}
# 7 x 7 sparse Matrix of class "dgCMatrix"
#
# [1,] . . 1 . . . 1
# [2,] . . . . 1 1 .
# [3,] . 1 . . . 1 .
# [4,] . . . . 1 . 1
# [5,] 1 . . 1 . . .
# [6,] 1 1 . . . . .
# [7,] . 1 . . 1 . .
res <- Matrix(0, nrow = N, ncol = N)
set.seed(1)
ind <- cbind(rep(1:N, each = 2), c(sapply(1:N, function(j) sample((1:N)[-j], 2))))
res[ind] <- 1
all.equal(res, desired_output)
# [1] TRUE
Quick bench:
microbenchmark::microbenchmark(
OP = {
desired_output <- Matrix(0, nrow = N, ncol = N)
set.seed(1)
for(j in 1:N) {
cols <- sample((1:N)[-j], 2) #Choose 2 columns not equal to the
desired_output[j, cols] <- 1
}
},
Aurele = {
res <- Matrix(0, nrow = N, ncol = N)
set.seed(1)
ind <- cbind(rep(1:N, each = 2), c(sapply(1:N, function(j) sample((1:N)[-j], 2))))
res[ind] <- 1
}
)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# OP 10.240969 10.509384 11.065336 10.804949 11.044846 14.903377 100 b
# Aurele 1.185001 1.258037 1.392021 1.363503 1.434818 4.553614 100 a
EDIT: I've edited my answer to make it simpler and to include the way R and RcppArmadillo make the sampling. And now it seems to be linear with N (as I thought it would be).
There are two "problems" in your code:
sample((1:N)[-j], 2) is uncessary allocating memory, making your solution quadratic with N. The solution would be to use rejection sampling as N is large (so rejection will not occur often).It is true that because you have sample with no replacement, it is not straightforward to make a vectorized solution for your problem. But again, it would be possible by using rejection sampling. Here I prefer an Rcpp solution:
Rcpp:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerMatrix sample2(int N) {
IntegerMatrix res(2 * N, 2);
int j, ind1, ind2;
for (j = 0; j < N; j++) {
res(2 * j, 0) = res(2 * j + 1, 0) = j + 1;
// sample first one
do {
ind1 = N * unif_rand();
} while (ind1 == j);
res(2 * j, 1) = ind1 + 1;
// sample second one
do {
ind2 = N * unif_rand();
} while (ind2 == j || ind2 == ind1);
res(2 * j + 1, 1) = ind2 + 1;
}
return res;
}
R:
# table(replicate(1e5, sample2(5))) # Verify that the sampling is OK
library(Matrix)
N <- 1000
m <- Matrix(0, nrow = N, ncol = N)
m[sample2(N)] <- 1
Benchmark:
microbenchmark::microbenchmark(
OP = {
desired_output <- Matrix(0, nrow = N, ncol = N)
for(j in 1:N) {
cols <- sample((1:N)[-j], 2) #Choose 2 columns not equal to the
desired_output[j, cols] <- 1
}
},
Aurele = {
res <- Matrix(0, nrow = N, ncol = N)
ind <- cbind(rep(1:N, each = 2), c(sapply(1:N, function(j) sample((1:N)[-j], 2))))
res[ind] <- 1
},
privefl = {
m <- Matrix(0, nrow = N, ncol = N)
m[sample2(N)] <- 1
},
times = 20
)
Results with N = 1000:
Unit: milliseconds
expr min lq mean median uq max neval
OP 599.996226 605.868229 618.479868 615.653853 625.908794 679.292360 20
Aurele 12.315795 12.633971 14.183891 13.148149 15.118948 19.649716 20
privefl 1.401824 1.493371 1.649015 1.622826 1.704273 2.520239 20
Results with N = 10,000:
Unit: milliseconds
expr min lq mean median uq max neval
Aurele 812.018743 845.434915 903.387191 863.851661 967.08294 1084.738882 20
privefl 3.657525 4.083799 4.409226 4.239576 4.49501 6.413498 20
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With