I am trying to tune some params and the search space is very large. I have 5 dimensions so far and it will probably increase to about 10. The issue is that I think I can get a significant speedup if I can figure out how to multi-process it, but I can't find any good ways to do it. I am using hyperopt
and I can't figure out how to make it use more than 1 core. Here is the code that I have without all the irrelevant stuff:
from numpy import random
from pandas import DataFrame
from hyperopt import fmin, tpe, hp, Trials
def calc_result(x):
huge_df = DataFrame(random.randn(100000, 5), columns=['A', 'B', 'C', 'D', 'E'])
total = 0
# Assume that I MUST iterate
for idx_and_row in huge_df.iterrows():
idx = idx_and_row[0]
row = idx_and_row[1]
# Assume there is no way to optimize here
curr_sum = row['A'] * x['adjustment_1'] + \
row['B'] * x['adjustment_2'] + \
row['C'] * x['adjustment_3'] + \
row['D'] * x['adjustment_4'] + \
row['E'] * x['adjustment_5']
total += curr_sum
# In real life I want the total as high as possible, but for the minimizer, it has to negative a negative value
total_as_neg = total * -1
print(total_as_neg)
return total_as_neg
space = {'adjustment_1': hp.quniform('adjustment_1', 0, 1, 0.001),
'adjustment_2': hp.quniform('adjustment_2', 0, 1, 0.001),
'adjustment_3': hp.quniform('adjustment_3', 0, 1, 0.001),
'adjustment_4': hp.quniform('adjustment_4', 0, 1, 0.001),
'adjustment_5': hp.quniform('adjustment_5', 0, 1, 0.001)}
trials = Trials()
best = fmin(fn = calc_result,
space = space,
algo = tpe.suggest,
max_evals = 20000,
trials = trials)
As of now, I have 4 cores but I can basically get as many as I need. How can I get hyperopt
to use more than 1 core, or is there a library that can multiprocess?
First, we’ll try Grid Search. The Python implementation of Grid Search can be done using the Scikit-learn GridSearchCV function. It has the following important parameters:
This method is specially useful when there are only a few hyperparameters to optimize, although it is outperformed by other weighted-random search methods when the ML model grows in complexity. This article introduces the idea of Grid Search for hyperparameter tuning.
In this tutorial, we are going to talk about a very powerful optimization (or automation) algorithm, i.e. the Grid Search Algorithm. It is most commonly used for hyperparameter tuning in machine learning models.
How to Use Grid Search in scikit-learn Grid search is a model hyperparameter optimization technique. In scikit-learn this technique is provided in the GridSearchCVclass. When constructing this class you must provide a dictionary of hyperparameters to evaluate in the param_gridargument.
If you have a Mac or Linux (or Windows Linux Subsystem), you can add about 10 lines of code to do this in parallel with ray
. If you install ray via the latest wheels here, then you can run your script with minimal modifications, shown below, to do parallel/distributed grid searching with HyperOpt. At a high level, it runs fmin
with tpe.suggest and creates a Trials object internally in a parallel fashion.
from numpy import random
from pandas import DataFrame
from hyperopt import fmin, tpe, hp, Trials
def calc_result(x, reporter): # add a reporter param here
huge_df = DataFrame(random.randn(100000, 5), columns=['A', 'B', 'C', 'D', 'E'])
total = 0
# Assume that I MUST iterate
for idx_and_row in huge_df.iterrows():
idx = idx_and_row[0]
row = idx_and_row[1]
# Assume there is no way to optimize here
curr_sum = row['A'] * x['adjustment_1'] + \
row['B'] * x['adjustment_2'] + \
row['C'] * x['adjustment_3'] + \
row['D'] * x['adjustment_4'] + \
row['E'] * x['adjustment_5']
total += curr_sum
# In real life I want the total as high as possible, but for the minimizer, it has to negative a negative value
# total_as_neg = total * -1
# print(total_as_neg)
# Ray will negate this by itself to feed into HyperOpt
reporter(timesteps_total=1, episode_reward_mean=total)
return total_as_neg
space = {'adjustment_1': hp.quniform('adjustment_1', 0, 1, 0.001),
'adjustment_2': hp.quniform('adjustment_2', 0, 1, 0.001),
'adjustment_3': hp.quniform('adjustment_3', 0, 1, 0.001),
'adjustment_4': hp.quniform('adjustment_4', 0, 1, 0.001),
'adjustment_5': hp.quniform('adjustment_5', 0, 1, 0.001)}
import ray
import ray.tune as tune
from ray.tune.hpo_scheduler import HyperOptScheduler
ray.init()
tune.register_trainable("calc_result", calc_result)
tune.run_experiments({"experiment": {
"run": "calc_result",
"repeat": 20000,
"config": {"space": space}}}, scheduler=HyperOptScheduler())
You can use multiprocessing
to run tasks that, through bypassing Python's Global Interpreter Lock, effectively run concurrently in the multiple processors available.
To run a multiprocessing task, one must instantiate a Pool
and have this object execute a map
function over an iterable object.
The function map
simply applies a function over every element of an iterable, like a list, and returns another list with the elements on it.
As an example with search, this gets all items larger than five from a list:
from multiprocessing import Pool
def filter_gt_5(x):
for i in x:
if i > 5
return i
if __name__ == '__main__':
p = Pool(4)
a_list = [6, 5, 4, 3, 7, 8, 10, 9, 2]
#find a better way to split your list.
lists = p.map(filter_gt_5, [a_list[:3], a_list[3:6], a_list[6:])
#this will join the lists in one.
filtered_list = list(chain(*lists))
In your case, you would have to split your search base.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With