R system() cannot allocate memory even though the same command can be run from a terminal




I have an issue with the R system() function (for running an OS command from within R) that only arises when the R session uses up more than some fraction of the available RAM (maybe ~75% in my case), even though there is plenty of RAM available (~15GB in my case) and the same OS command can be easily run at the same time from a terminal.

System info:
64GB RAM PC (local desktop PC, not cloud-based or cluster)
Ubuntu 18.04.1 LTS - x86_64-pc-linux-gnu (64-bit)
R version 3.5.2 (executed directly, not e.g. via docker)

This example demonstrates the issue. The size of the data frame d needs to be adjusted to be as small as possible and still provoke the error. This will depend on how much RAM you have and what else is running at the same time.

ross@doppio:~$ R

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> n <- 5e8
> d <- data.frame(
+   v0 = rep_len(1.0, n),
+   v1 = rep_len(1.0, n),
+   v2 = rep_len(1.0, n),
+   v3 = rep_len(1.0, n),
+   v4 = rep_len(1.0, n),
+   v5 = rep_len(1.0, n),
+   v6 = rep_len(1.0, n),
+   v7 = rep_len(1.0, n),
+   v8 = rep_len(1.0, n),
+   v9 = rep_len(1.0, n)
+ )

> dim(d)
[1] 500000000        10

> gc()
             used    (Mb) gc trigger    (Mb)   max used    (Mb)
Ncells     260857    14.0     627920    33.6     421030    22.5
Vcells 5000537452 38151.1 6483359463 49464.2 5000559813 38151.3

> system("free -m", intern = FALSE)
Warning messages:
1: In system("free -m", intern = FALSE) :
  system call failed: Cannot allocate memory
2: In system("free -m", intern = FALSE) : error in running command

The call to gc() indicates R has allocated ~38GB out of 64 GB RAM and running free -m in a terminal at the same time (see below) shows that the OS thinks there is ~16GB free.

ross@doppio:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:          64345       44277       15904         461        4162       18896
Swap:           975           1         974

So free -m can't be run from within R because memory cannot be allocated, but free -m can be run at the same time from a terminal, and you would think that 15GB would be enough to run a light-weight command like free -m.

If the R memory usage is below some threshold then free -m can be run from within R.

I guess that R is trying allocate an amount of memory for free -m that is more than actually needed and depends on the amount of memory already allocated. Can anyone shed some light on what is going on here?


1 Answers

I've run into this one. R runs fork to run the sub process, temporarily doubling the 35GB image to more than the 64GB you have. If it had lived it would have next called exec and given back the duped memory. This isn't how fork/exec is supposed to go (it is supposed to be copy on write with no extra cost- but somehow it does this in this case).

It looks like this may be known: that to fork you must have enough memory to potentially duplicate the pages (even if that does not happen). I would guess you may not have enough swap (it seems at least the size of RAM is recommended). Here are some instructions on configuring swap (it is for ec2, but covers the use of Linux): https://aws.amazon.com/premiumsupport/knowledge-center/ec2-memory-swap-file/

