Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python and HyperOpt: How to make multi-process grid searching?

I am trying to tune some params and the search space is very large. I have 5 dimensions so far and it will probably increase to about 10. The issue is that I think I can get a significant speedup if I can figure out how to multi-process it, but I can't find any good ways to do it. I am using hyperopt and I can't figure out how to make it use more than 1 core. Here is the code that I have without all the irrelevant stuff:

from numpy    import random
from pandas   import DataFrame
from hyperopt import fmin, tpe, hp, Trials





def calc_result(x):

    huge_df = DataFrame(random.randn(100000, 5), columns=['A', 'B', 'C', 'D', 'E'])

    total = 0

    # Assume that I MUST iterate
    for idx_and_row in huge_df.iterrows():
        idx = idx_and_row[0]
        row = idx_and_row[1]


        # Assume there is no way to optimize here
        curr_sum = row['A'] * x['adjustment_1'] + \
                   row['B'] * x['adjustment_2'] + \
                   row['C'] * x['adjustment_3'] + \
                   row['D'] * x['adjustment_4'] + \
                   row['E'] * x['adjustment_5']


        total += curr_sum

    # In real life I want the total as high as possible, but for the minimizer, it has to negative a negative value
    total_as_neg = total * -1

    print(total_as_neg)

    return total_as_neg


space = {'adjustment_1': hp.quniform('adjustment_1', 0, 1, 0.001),
         'adjustment_2': hp.quniform('adjustment_2', 0, 1, 0.001),
         'adjustment_3': hp.quniform('adjustment_3', 0, 1, 0.001),
         'adjustment_4': hp.quniform('adjustment_4', 0, 1, 0.001),
         'adjustment_5': hp.quniform('adjustment_5', 0, 1, 0.001)}

trials = Trials()

best = fmin(fn        = calc_result,
            space     = space,
            algo      = tpe.suggest,
            max_evals = 20000,
            trials    = trials)

As of now, I have 4 cores but I can basically get as many as I need. How can I get hyperopt to use more than 1 core, or is there a library that can multiprocess?

like image 706
user1367204 Avatar asked Mar 19 '18 19:03

user1367204


People also ask

How can I use grid search in Python?

First, we’ll try Grid Search. The Python implementation of Grid Search can be done using the Scikit-learn GridSearchCV function. It has the following important parameters:

What is grid search for hyperparameter tuning?

This method is specially useful when there are only a few hyperparameters to optimize, although it is outperformed by other weighted-random search methods when the ML model grows in complexity. This article introduces the idea of Grid Search for hyperparameter tuning.

What is the grid search algorithm?

In this tutorial, we are going to talk about a very powerful optimization (or automation) algorithm, i.e. the Grid Search Algorithm. It is most commonly used for hyperparameter tuning in machine learning models.

How to use grid search in scikit-learn?

How to Use Grid Search in scikit-learn Grid search is a model hyperparameter optimization technique. In scikit-learn this technique is provided in the GridSearchCVclass. When constructing this class you must provide a dictionary of hyperparameters to evaluate in the param_gridargument.


2 Answers

If you have a Mac or Linux (or Windows Linux Subsystem), you can add about 10 lines of code to do this in parallel with ray. If you install ray via the latest wheels here, then you can run your script with minimal modifications, shown below, to do parallel/distributed grid searching with HyperOpt. At a high level, it runs fmin with tpe.suggest and creates a Trials object internally in a parallel fashion.

from numpy    import random
from pandas   import DataFrame
from hyperopt import fmin, tpe, hp, Trials


def calc_result(x, reporter):  # add a reporter param here

    huge_df = DataFrame(random.randn(100000, 5), columns=['A', 'B', 'C', 'D', 'E'])

    total = 0

    # Assume that I MUST iterate
    for idx_and_row in huge_df.iterrows():
        idx = idx_and_row[0]
        row = idx_and_row[1]


        # Assume there is no way to optimize here
        curr_sum = row['A'] * x['adjustment_1'] + \
                   row['B'] * x['adjustment_2'] + \
                   row['C'] * x['adjustment_3'] + \
                   row['D'] * x['adjustment_4'] + \
                   row['E'] * x['adjustment_5']


        total += curr_sum

    # In real life I want the total as high as possible, but for the minimizer, it has to negative a negative value
    # total_as_neg = total * -1

    # print(total_as_neg)

    # Ray will negate this by itself to feed into HyperOpt
    reporter(timesteps_total=1, episode_reward_mean=total)

    return total_as_neg


space = {'adjustment_1': hp.quniform('adjustment_1', 0, 1, 0.001),
         'adjustment_2': hp.quniform('adjustment_2', 0, 1, 0.001),
         'adjustment_3': hp.quniform('adjustment_3', 0, 1, 0.001),
         'adjustment_4': hp.quniform('adjustment_4', 0, 1, 0.001),
         'adjustment_5': hp.quniform('adjustment_5', 0, 1, 0.001)}

import ray
import ray.tune as tune
from ray.tune.hpo_scheduler import HyperOptScheduler

ray.init()
tune.register_trainable("calc_result", calc_result)
tune.run_experiments({"experiment": {
    "run": "calc_result",
    "repeat": 20000,
    "config": {"space": space}}}, scheduler=HyperOptScheduler())
like image 139
richliaw Avatar answered Oct 13 '22 00:10

richliaw


You can use multiprocessing to run tasks that, through bypassing Python's Global Interpreter Lock, effectively run concurrently in the multiple processors available.

To run a multiprocessing task, one must instantiate a Pool and have this object execute a map function over an iterable object.

The function map simply applies a function over every element of an iterable, like a list, and returns another list with the elements on it.

As an example with search, this gets all items larger than five from a list:

from multiprocessing import Pool

def filter_gt_5(x):
   for i in x:
       if i > 5
           return i

if __name__ == '__main__':
    p = Pool(4)
    a_list = [6, 5, 4, 3, 7, 8, 10, 9, 2]
    #find a better way to split your list.
    lists = p.map(filter_gt_5, [a_list[:3], a_list[3:6], a_list[6:])
    #this will join the lists in one.
    filtered_list = list(chain(*lists))

In your case, you would have to split your search base.

like image 36
Gabriel Fernandez Avatar answered Oct 13 '22 00:10

Gabriel Fernandez