Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory leak for Optuna trial with multiprocessing

The Background

I have a machine learning pipeline that consists of N boosted models (LGBMRegressor), each with identical hyperparameters. Each of the N LGBMRegressors is trained on a separate chunk of data. My current workstation has a lot of cores, so I multiprocess each regressor on a separate thread.

The Problem

I am trying to tune the parameters that go into the LGBMRegressors through optuna. When I use the multiprocessing inside an optuna trial, it has a memory leak and I run out of memory. Can I use multiprocessing inside an optuna trial and not run into a memory leak?

Minimal Reproducible Example

import optuna
import pandas as pd
import numpy as np
import multiprocessing
from lightgbm import LGBMRegressor

N = 500
n_cores = 30
rows_per_N = 1000
cols_per_N=50
data = [ [np.random.normal(size=(rows_per_N, cols_per_N)), np.random.normal(size=(rows_per_N, ))] for i in range(N)]

def get_metric(data):
    (X, y), params = data
    model =LGBMRegressor(**params)
    model.fit(X, y)
    return np.abs( model.predict(X) - y )


def objective(trial):
    param = {
        "n_jobs": "1",
        "num_leaves": trial.suggest_int("num_leaves", 2, 256)
    }
    lgb_params = [param for _ in range(N)]
    p = multiprocessing.Pool(n_cores)
    results = p.map(get_metric, zip(data,lgb_params))
    return np.mean(results)


study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=100)

Alternative Solutions

I have written the above code as a for loop and it has not had memory issues. The drawback here is that this is 30x slower than the multiprocessed solution.

like image 685
Ottpocket Avatar asked Sep 13 '25 23:09

Ottpocket


1 Answers

As suggested by @J. M. Arnold, I think you can should use context manager for the pool to make sure it is closed to avoid potential memory leak. Additionally, from the documentation, you can avoid out of memory error by periodically running the garbage collector. You can do that by setting the parameter gc_after_trial to True in study.optimize method

import optuna
import pandas as pd
import numpy as np
import multiprocessing
from lightgbm import LGBMRegressor

N = 500
n_cores = 30
rows_per_N = 1000
cols_per_N=50
data = [ [np.random.normal(size=(rows_per_N, cols_per_N)), np.random.normal(size=(rows_per_N, ))] for i in range(N)]

def get_metric(data):
    (X, y), params = data
    model =LGBMRegressor(**params)
    model.fit(X, y)
    return np.abs( model.predict(X) - y )


def objective(trial):
    param = {
        "n_jobs": "1",
        "num_leaves": trial.suggest_int("num_leaves", 2, 256)
    }
    lgb_params = [param for _ in range(N)]
    with multiprocessing.pool.Pool(n_cores) as p:
        results = p.map(get_metric, zip(data,lgb_params))
        return np.mean(results)


study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=100, gc_after_trial=True)
like image 108
vht981230 Avatar answered Sep 15 '25 13:09

vht981230