I am trying to run a QR decomposition (LAPACKE_dgeqrf) in R on a linux machine (CentOS) using a C++ program that is interfaced with Rcpp. Unfortunately, I see only 100% using top. This also happens on a Red Hat Enterprise Linux Server. However, the C++ program (with LAPACKE_dgeqrf) runs at nthreads * 100% when started from the terminal (independently outside of R). I compiled OpenBLAS with
NO_AFFINITY=1
and tried
export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4
export OPENBLAS_MAIN_FREE=1
Nothing works. Everything works fine on a Mac though. 'mcaffinity()' from the parallel R package returns NULL. I configured R using
configure 'CFLAGS=-g -O3 -Wall -pedantic' 'CXXFLAGS=-g -O3 -Wall -pedantic' 'FCFLAGS=-g -O3' 'F77FLAGS=-g -O3' '--with-system-zlib' '--enable-memory-profiling'
My C++ code:
#include <Rcpp.h>
#include <lapacke.h>
#include <cblas.h>
//[[Rcpp::export]]
Rcpp::NumericMatrix QRopenblas(Rcpp::NumericMatrix X)
{
// Declare variables
int n_rows = X.nrow(), n_cols = X.ncol(), min_mn = std::min(n_rows, n_cols);
Rcpp::NumericVector tau(min_mn);
// Perform QR decomposition
LAPACKE_dgeqrf(CblasColMajor, n_rows, n_cols, X.begin(), n_rows, tau.begin());
return X;
}
My R code:
PKG_LIBS <- '/pathto/openblas/lib/libopenblas.a'
PKG_CPPFLAGS <- '-I/pathto/openblas/include'
Sys.setenv(PKG_LIBS = PKG_LIBS , PKG_CPPFLAGS = PKG_CPPFLAGS)
Rcpp::sourceCpp('/pathto/QRopenblas.cpp', rebuild = TRUE)
n_row <- 4000
n_col <- 4000
A <- matrix(rnorm(n_row * n_col), n_row, n_col)
res <- QRopenblas(A)
I found a solution by rebuilding R and configuring it using
../configure --enable-BLAS-shlib --enable-R-shlib --enable-memory-profiling --with-tcltk=no
Afterwards, I had to replace libRblas.so
with the corresponding OpenBLAS file libopenblas.so
. Btw, I build OpenBLAS with standard settings (i.e. with affinity). The R function qr()
now uses all cores and the C++ programs as well. The reason why this works is that upon startup R is now launched with multiple threads (as verified with cat /proc/pid/status
). Without replacing libRblas.so
, R is launched with one thread and then upon calling OpenBLAS multiple threads are launched, which are properly pinned to the first core.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With