I started using ray for distributed machine learning and I already have some issues. The memory usage is simply growing until the program crashes. Altough I clear the list constantly, the memory is somehow leaking. Any idea why ?
My specs: OS Platform and Distribution: Ubuntu 16.04 Ray installed from: binary Ray version: 0.6.5 Python version:3.6.8
I already tried using the experimental queue instead of the DataServer class, but the problem is still the same.
import numpy as np
import ray
import time
ray.init(redis_max_memory=100000000)
@ray.remote
class Runner():
def __init__(self, dataList):
self.run(dataList)
def run(self,dataList):
while True:
dataList.put.remote(np.ones(10))
@ray.remote
class Optimizer():
def __init__(self, dataList):
self.optimize(dataList)
def optimize(self,dataList):
while True:
dataList.pop.remote()
@ray.remote
class DataServer():
def __init__(self):
self.dataList= []
def put(self,data):
self.dataList.append(data)
def pop(self):
if len(self.dataList) !=0:
return self.dataList.pop()
def get_size(self):
return len(self.dataList)
dataServer = DataServer.remote()
runner = Runner.remote(dataServer)
optimizer1 = Optimizer.remote(dataServer)
optimizer2 = Optimizer.remote(dataServer)
while True:
time.sleep(1)
print(ray.get(dataServer.get_size.remote()))
After running for some time I get this error message:
I recently ran into a similar problem and found that if you are frequently putting large objects (using ray.put()
) that you need to either:
Manually either adjust the thresholds that the python garbage collector uses
Call the gc.collect()
on a regular basis.
I implemented a method that checks the amount of used memory and then calls the garbage collector.
The problem is that the default thresholds are based upon the # of objects, but if you are putting large objects, the gc may never get called until you run out of memory. My utility method is as follows:
def auto_garbage_collect(pct=80.0):
"""
auto_garbage_collection - Call the garbage collection if memory used is greater than 80% of total available memory.
This is called to deal with an issue in Ray not freeing up used memory.
pct - Default value of 80%. Amount of memory in use that triggers the garbage collection call.
"""
if psutil.virtual_memory().percent >= pct:
gc.collect()
return
Calling this will solve the problem when it is related pushing large objects via ray.put() and running out of memory.
A quick fix is to use:
ray.shutdown()
I code in Spyder which displays the percentage of memory used in the bottom right corner. When I run the same script multiple times, I noticed that the memory percentage value increased in increments of 3% (based on the 8 gigs RAM I have). This made me wonder if ray was storing something like a session due to the increments (each one corresponding to a session).
It turns out that it does.
ray.shutdown()
ends the session. However, you need to call ray.init()
again if you want to run your script again. Also, make sure you place this in the correct location as to not end ray while it is still needed.
This solves the problem of increasing memory usage with running a script several times.
I do not know Ray very well but, ray.init()
has various arguments relating to addresses of sorts. I am sure there must be a way to make ray run on the same session via one of these arguments. This is speculation. I have not attempted any of this yet. Perhaps you can figure this out?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With