Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to allow h2o to access all available memory?

I am running h2o through Rstudio Server on a linux server with 64 GB of RAM. When I initialize the cluster it says that the total cluster memory is only 9.78 GB. I have tried using the max_mem_size parameter but still only using 9.78 GB.

localH2O <<- h2o.init(ip =  "localhost", port = 54321, nthreads = -1, max_mem_size = "25g")
H2O is not running yet, starting it now...
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
    Connection successful!
    R is connected to the H2O cluster: 
        H2O cluster uptime:         5 hours 10 minutes 
        H2O cluster version:        3.10.4.6 
        H2O cluster version age:    19 days  
        H2O cluster name:           H2O_started_from_R_miweis_mxv543 
        H2O cluster total nodes:    1 
        H2O cluster total memory:   9.78 GB 
        H2O cluster total cores:    16 
        H2O cluster allowed cores:  16 
        H2O cluster healthy:        TRUE 
        H2O Connection ip:          localhost 
        H2O Connection port:        54321 
        H2O Connection proxy:       NA 
        H2O Internal Security:      FALSE 
        R Version:                  R version 3.3.3 (2017-03-06) 

I ran the following on the server to insure the amount of memory available:

cat /proc/meminfo
MemTotal:       65806476 kB

EDIT:

I was looking more into this issue and it seems like it is a default within the JVM. When I started h2o directly in Java I was able to pass in the command -Xmx32g and it did increase the memory. I could then connect to that h2o instance in Rstudio and have access to the increases memory. I was wondering if there was a way to change this default value in the JVM and allow more memory so I don't have to first start the h2o instance from the command line then connect to it from Rstudio server.

like image 551
mikew Avatar asked May 16 '17 20:05

mikew


People also ask

What does H2O init do?

By default, h2o. init() first checks if an H2O instance is connectible. If it cannot connect and start = TRUE with ip = "localhost" , it will attempt to start an instance of H2O at localhost:54321.

What is H2O cluster?

It basically means all the computations, data and everything involved in machine learning happens in the distributed memory of the H2O cluster itself. You can think of a cluster like a bunch of nodes, sharing memory and computation. A Node could be a server, an EC2 instance, or your laptop.


2 Answers

The max_mem_size argument in the h2o R package is functional, so you can use it to start an H2O cluster of whatever size you want -- you don't need to start it from the command line using -Xmx.

What's seems to be happening in your case is that you are connecting to an existing H2O cluster located at localhost:54321 that was limited to "10G" (in reality, 9.78 GB). So when you run h2o.init() from R, it will just connect to the existing cluster (with a fixed memory), rather than starting a new H2O cluster with the memory that you specified in max_mem_size, and so the memory request gets ignored.

To fix, you should do one of the following:

  • Kill the existing H2O cluster at localhost:54321 and restart from R with the desired memory requirement, or
  • start a cluster from R at different IP/port than the one that's already running.
like image 179
Erin LeDell Avatar answered Sep 30 '22 10:09

Erin LeDell


When starting up h2o.init() want to specify the argument min_mem_size=

This forces H2O to use at least that amount of memory. max_mem_size= prevents H2O from using more than that amount of memory.

like image 42
Clem Wang Avatar answered Sep 30 '22 11:09

Clem Wang