I am trying to tune some params and the search space is very large. I have 5 dimensions so far and it will probably increase to about 10. The issue is that I think I can get a significant speedup if I can figure out how to multi-process it, but I can't find any good ways to do it. I am using hyperopt and I can't figure out how to make it use more than 1 core. Here is the code that I have without all the irrelevant stuff:
from numpy import random
from pandas import DataFrame
from hyperopt import fmin, tpe, hp, Trials
def calc_result(x):
huge_df = DataFrame(random.randn(100000, 5), columns=['A', 'B', 'C', 'D', 'E'])
total = 0
# Assume that I MUST iterate
for idx_and_row in huge_df.iterrows():
idx = idx_and_row[0]
row = idx_and_row[1]
# Assume there is no way to optimize here
curr_sum = row['A'] * x['adjustment_1'] + \
row['B'] * x['adjustment_2'] + \
row['C'] * x['adjustment_3'] + \
row['D'] * x['adjustment_4'] + \
row['E'] * x['adjustment_5']
total += curr_sum
# In real life I want the total as high as possible, but for the minimizer, it has to negative a negative value
total_as_neg = total * -1
print(total_as_neg)
return total_as_neg
space = {'adjustment_1': hp.quniform('adjustment_1', 0, 1, 0.001),
'adjustment_2': hp.quniform('adjustment_2', 0, 1, 0.001),
'adjustment_3': hp.quniform('adjustment_3', 0, 1, 0.001),
'adjustment_4': hp.quniform('adjustment_4', 0, 1, 0.001),
'adjustment_5': hp.quniform('adjustment_5', 0, 1, 0.001)}
trials = Trials()
best = fmin(fn = calc_result,
space = space,
algo = tpe.suggest,
max_evals = 20000,
trials = trials)
As of now, I have 4 cores but I can basically get as many as I need. How can I get hyperopt to use more than 1 core, or is there a library that can multiprocess?
First, we’ll try Grid Search. The Python implementation of Grid Search can be done using the Scikit-learn GridSearchCV function. It has the following important parameters:
This method is specially useful when there are only a few hyperparameters to optimize, although it is outperformed by other weighted-random search methods when the ML model grows in complexity. This article introduces the idea of Grid Search for hyperparameter tuning.
In this tutorial, we are going to talk about a very powerful optimization (or automation) algorithm, i.e. the Grid Search Algorithm. It is most commonly used for hyperparameter tuning in machine learning models.
How to Use Grid Search in scikit-learn Grid search is a model hyperparameter optimization technique. In scikit-learn this technique is provided in the GridSearchCVclass. When constructing this class you must provide a dictionary of hyperparameters to evaluate in the param_gridargument.
If you have a Mac or Linux (or Windows Linux Subsystem), you can add about 10 lines of code to do this in parallel with ray. If you install ray via the latest wheels here, then you can run your script with minimal modifications, shown below, to do parallel/distributed grid searching with HyperOpt. At a high level, it runs fmin with tpe.suggest and creates a Trials object internally in a parallel fashion.
from numpy import random
from pandas import DataFrame
from hyperopt import fmin, tpe, hp, Trials
def calc_result(x, reporter): # add a reporter param here
huge_df = DataFrame(random.randn(100000, 5), columns=['A', 'B', 'C', 'D', 'E'])
total = 0
# Assume that I MUST iterate
for idx_and_row in huge_df.iterrows():
idx = idx_and_row[0]
row = idx_and_row[1]
# Assume there is no way to optimize here
curr_sum = row['A'] * x['adjustment_1'] + \
row['B'] * x['adjustment_2'] + \
row['C'] * x['adjustment_3'] + \
row['D'] * x['adjustment_4'] + \
row['E'] * x['adjustment_5']
total += curr_sum
# In real life I want the total as high as possible, but for the minimizer, it has to negative a negative value
# total_as_neg = total * -1
# print(total_as_neg)
# Ray will negate this by itself to feed into HyperOpt
reporter(timesteps_total=1, episode_reward_mean=total)
return total_as_neg
space = {'adjustment_1': hp.quniform('adjustment_1', 0, 1, 0.001),
'adjustment_2': hp.quniform('adjustment_2', 0, 1, 0.001),
'adjustment_3': hp.quniform('adjustment_3', 0, 1, 0.001),
'adjustment_4': hp.quniform('adjustment_4', 0, 1, 0.001),
'adjustment_5': hp.quniform('adjustment_5', 0, 1, 0.001)}
import ray
import ray.tune as tune
from ray.tune.hpo_scheduler import HyperOptScheduler
ray.init()
tune.register_trainable("calc_result", calc_result)
tune.run_experiments({"experiment": {
"run": "calc_result",
"repeat": 20000,
"config": {"space": space}}}, scheduler=HyperOptScheduler())
You can use multiprocessing to run tasks that, through bypassing Python's Global Interpreter Lock, effectively run concurrently in the multiple processors available.
To run a multiprocessing task, one must instantiate a Pool and have this object execute a map function over an iterable object.
The function map simply applies a function over every element of an iterable, like a list, and returns another list with the elements on it.
As an example with search, this gets all items larger than five from a list:
from multiprocessing import Pool
def filter_gt_5(x):
for i in x:
if i > 5
return i
if __name__ == '__main__':
p = Pool(4)
a_list = [6, 5, 4, 3, 7, 8, 10, 9, 2]
#find a better way to split your list.
lists = p.map(filter_gt_5, [a_list[:3], a_list[3:6], a_list[6:])
#this will join the lists in one.
filtered_list = list(chain(*lists))
In your case, you would have to split your search base.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With