Every invocation of R is creating 63 sub processes
Rscript --vanilla -e 'Sys.sleep(5)' & pstree -p $! | grep -c '{R}'
# 63
where pstree
looks something like this
R(2562809)─┬─{R}(2562818)
├─{R}(2562819)
...
├─{R}(2562878)
├─{R}(2562879)
└─{R}(2562880)
is this expected behavior?
This is a 72 core machine with debian 9.3, R 3.4.3, blas 3.7.0, and openmp 2.0.2
dpkg-query -l '*blas*' 'r-base' '*lapack*' '*openmp*'|grep ^ii
ii libblas-common 3.7.0-2 amd64 Dependency package for all BLAS implementations
ii libblas-dev 3.7.0-2 amd64 Basic Linear Algebra Subroutines 3, static library
ii libblas3 3.7.0-2 amd64 Basic Linear Algebra Reference implementations, shared library
ii liblapack-dev 3.7.0-2 amd64 Library of linear algebra routines 3 - static version
ii liblapack3 3.7.0-2 amd64 Library of linear algebra routines 3 - shared version
ii libopenblas-base 0.2.19-3 amd64 Optimized BLAS (linear algebra) library (shared library)
ii libopenmpi-dev 2.0.2-2 amd64 high performance message passing library -- header files
ii libopenmpi2:amd64 2.0.2-2 amd64 high performance message passing library -- shared library
ii libopenmpt0:amd64 0.2.7386~beta20.3-3+deb9u2 amd64 module music library based on OpenMPT -- shared library
ii openmpi-bin 2.0.2-2 amd64 high performance message passing library -- binaries
ii openmpi-common 2.0.2-2 all high performance message passing library -- common files
ii r-base 3.4.3-1~stretchcran.0 all GNU R statistical computation and graphics system
R is using openblas and openmp libraries
Rscript --vanilla -e 'Sys.sleep(1)' & lsof -p $! |egrep -i 'blas|lapack|parallel|omp'
[1] 2574896
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
Output information may be incomplete.
R 2574896 foranw mem REG 0,20 13931603 /usr/lib/libopenblasp-r0.2.19.so (path dev=0,21)
R 2574896 foranw mem REG 0,20 13931604 /usr/lib/openblas-base/libblas.so.3 (path dev=0,21)
R 2574896 foranw mem REG 0,20 13840156 /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0 (path dev=0,21)
If you create thousands of threads then you will waste time context switching between them and your work will take longer to complete. Instead of manually starting new threads you should use the thread pool to perform your work so Windows itself can balance the optimum number of threads.
Ideally, no I/O, synchronization, etc., and there's nothing else running, use 48 threads of task. Realistically, use about 95 threads may be better to exploit the max of your machine. Because: a core waits for data or I/O sometimes, so thread 2 could run while thread 1 not running.
Each CPU core can have up to two threads if your CPU has multi/hyper-threading enabled. You can search for your own CPU processor to find out more. For Mac users, you can find out from About > System Report. This means that my 6-Core i7 processor has 6 cores and can have up to 12 threads.
Multithreading on Windows On Windows platforms, Microsoft R Open users can enable multi-threaded performance by: Downloading the optional, custom Intel Math Kernel Library (MKL) for Windows. Installing that library as described.
R is (famously) single-core.
I suspects this comes from libopenblas-base
which is (also known to be) multi-core.
Contrast this with our rocker container which uses libblas3
-- single-threaded, not optmized:
> system("pstree")
bash───R───sh───pstree
> system("ps -ax")
PID TTY STAT TIME COMMAND
1 pts/0 Ss 0:00 /bin/bash
579 pts/0 S+ 0:00 /usr/lib/R/bin/exec/R
583 pts/0 S+ 0:00 sh -c ps -ax
584 pts/0 R+ 0:00 ps -ax
>
As Debian maintainer for R, I take advantage of the fact that we have several BLAS / LAPACK builds. Base can be ok, OpenBLAS often is faster (but be careful when you then launch multiple cores from R via the different mechanisms) and there is also Atlas. What is "best" will always get a fimr "it depends".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With