Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python H2O Memory Management

Similar to this question in R here, I get out of memory issues when running loops with grid search in H2O. In R, doing gc() during each loop did help. What is the proposed solution here?

like image 815
user90772 Avatar asked Aug 01 '17 10:08

user90772


People also ask

What is H2O in Python?

H2O from Python is a tool for rapidly turning over models, doing data munging, and building applications in a fast, scalable environment without any of the mental anguish about parallelism and distribution of work.

What does H2O init do?

By default, h2o. init() first checks if an H2O instance is connectible. If it cannot connect and start = TRUE with ip = "localhost" , it will attempt to start an instance of H2O at localhost:54321.

What is H2O cluster?

It basically means all the computations, data and everything involved in machine learning happens in the distributed memory of the H2O cluster itself. You can think of a cluster like a bunch of nodes, sharing memory and computation. A Node could be a server, an EC2 instance, or your laptop.

What H2O 3?

H2O is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization of those models in an enterprise environment. H2O's core code is written in Java.


1 Answers

There appears to be no h2o.gc() function in the Python API. See "How can I debug memory issues?" in the FAQ. You could POST that back-end command (GarbageCollect) directly using the REST API if you suspect the problem is the back-end holding on to memory that it no longer should be. Studying the detailed logs, might help confirm if that is the case.

Wrapping up the advice from the comments:

  • Use h2o.remove() on H2O frames and models you no longer need, at the end of the loop.
  • Use h2o.removeAll() if you do not need to keep anything around, and your loop will be re-loading all the data it needs.
  • Use H2OGridSearch rather than your own loops and your own grid code.

I'd also add to be aware that cbind, rbind and any function that modifies an H2O frame will make a copy of the entire frame. Sometimes re-thinking the way you do your data munging steps can reduce the memory requirements.

like image 171
Darren Cook Avatar answered Sep 23 '22 18:09

Darren Cook