Python and HyperOpt: How to make multi-process grid searching?

Tags:

I am trying to tune some params and the search space is very large. I have 5 dimensions so far and it will probably increase to about 10. The issue is that I think I can get a significant speedup if I can figure out how to multi-process it, but I can't find any good ways to do it. I am using hyperopt and I can't figure out how to make it use more than 1 core. Here is the code that I have without all the irrelevant stuff:

from numpy    import random
from pandas   import DataFrame
from hyperopt import fmin, tpe, hp, Trials





def calc_result(x):

    huge_df = DataFrame(random.randn(100000, 5), columns=['A', 'B', 'C', 'D', 'E'])

    total = 0

    # Assume that I MUST iterate
    for idx_and_row in huge_df.iterrows():
        idx = idx_and_row[0]
        row = idx_and_row[1]


        # Assume there is no way to optimize here
        curr_sum = row['A'] * x['adjustment_1'] + \
                   row['B'] * x['adjustment_2'] + \
                   row['C'] * x['adjustment_3'] + \
                   row['D'] * x['adjustment_4'] + \
                   row['E'] * x['adjustment_5']


        total += curr_sum

    # In real life I want the total as high as possible, but for the minimizer, it has to negative a negative value
    total_as_neg = total * -1

    print(total_as_neg)

    return total_as_neg


space = {'adjustment_1': hp.quniform('adjustment_1', 0, 1, 0.001),
         'adjustment_2': hp.quniform('adjustment_2', 0, 1, 0.001),
         'adjustment_3': hp.quniform('adjustment_3', 0, 1, 0.001),
         'adjustment_4': hp.quniform('adjustment_4', 0, 1, 0.001),
         'adjustment_5': hp.quniform('adjustment_5', 0, 1, 0.001)}

trials = Trials()

best = fmin(fn        = calc_result,
            space     = space,
            algo      = tpe.suggest,
            max_evals = 20000,
            trials    = trials)

As of now, I have 4 cores but I can basically get as many as I need. How can I get hyperopt to use more than 1 core, or is there a library that can multiprocess?

706

asked Mar 19 '18 19:03

user1367204

2 Answers

If you have a Mac or Linux (or Windows Linux Subsystem), you can add about 10 lines of code to do this in parallel with ray. If you install ray via the latest wheels here, then you can run your script with minimal modifications, shown below, to do parallel/distributed grid searching with HyperOpt. At a high level, it runs fmin with tpe.suggest and creates a Trials object internally in a parallel fashion.

from numpy    import random
from pandas   import DataFrame
from hyperopt import fmin, tpe, hp, Trials


def calc_result(x, reporter):  # add a reporter param here

    huge_df = DataFrame(random.randn(100000, 5), columns=['A', 'B', 'C', 'D', 'E'])

    total = 0

    # Assume that I MUST iterate
    for idx_and_row in huge_df.iterrows():
        idx = idx_and_row[0]
        row = idx_and_row[1]


        # Assume there is no way to optimize here
        curr_sum = row['A'] * x['adjustment_1'] + \
                   row['B'] * x['adjustment_2'] + \
                   row['C'] * x['adjustment_3'] + \
                   row['D'] * x['adjustment_4'] + \
                   row['E'] * x['adjustment_5']


        total += curr_sum

    # In real life I want the total as high as possible, but for the minimizer, it has to negative a negative value
    # total_as_neg = total * -1

    # print(total_as_neg)

    # Ray will negate this by itself to feed into HyperOpt
    reporter(timesteps_total=1, episode_reward_mean=total)

    return total_as_neg


space = {'adjustment_1': hp.quniform('adjustment_1', 0, 1, 0.001),
         'adjustment_2': hp.quniform('adjustment_2', 0, 1, 0.001),
         'adjustment_3': hp.quniform('adjustment_3', 0, 1, 0.001),
         'adjustment_4': hp.quniform('adjustment_4', 0, 1, 0.001),
         'adjustment_5': hp.quniform('adjustment_5', 0, 1, 0.001)}

import ray
import ray.tune as tune
from ray.tune.hpo_scheduler import HyperOptScheduler

ray.init()
tune.register_trainable("calc_result", calc_result)
tune.run_experiments({"experiment": {
    "run": "calc_result",
    "repeat": 20000,
    "config": {"space": space}}}, scheduler=HyperOptScheduler())

139

answered Oct 13 '22 00:10

richliaw

You can use multiprocessing to run tasks that, through bypassing Python's Global Interpreter Lock, effectively run concurrently in the multiple processors available.

To run a multiprocessing task, one must instantiate a Pool and have this object execute a map function over an iterable object.

The function map simply applies a function over every element of an iterable, like a list, and returns another list with the elements on it.

As an example with search, this gets all items larger than five from a list:

from multiprocessing import Pool

def filter_gt_5(x):
   for i in x:
       if i > 5
           return i

if __name__ == '__main__':
    p = Pool(4)
    a_list = [6, 5, 4, 3, 7, 8, 10, 9, 2]
    #find a better way to split your list.
    lists = p.map(filter_gt_5, [a_list[:3], a_list[3:6], a_list[6:])
    #this will join the lists in one.
    filtered_list = list(chain(*lists))

In your case, you would have to split your search base.

answered Oct 13 '22 00:10

Gabriel Fernandez

Related questions
                            
                                is there a magic method for sorted() in Python?
                            
                                Remove non-ASCII characters from string columns in pandas
                            
                                "set_UVC" equivilent for a 3D quiver plot in matplotlib
                            
                                Creating a hotkey to enter text using python, running in background waiting for key-press
                            
                                Python Hadoop streaming on windows, Script not a valid Win32 application
                            
                                Pandas - add NaN for missing values when pd.merge
                            
                                What does Selenium .set_script_timeout(n) do and how is it different from driver.set_page_load_timeout(n)?
                            
                                Iteration over columns and rows in Pandas Dataframe
                            
                                Boto3 AWS API error responses for SSM
                            
                                ResultSet object has no attribute 'find_all'
                            
                                Tinting an image in Pygame
                            
                                How to append list of numerous types to single string (python)
                            
                                Is pandas showing the wrong percentile?
                            
                                What does RepeatedKFold actually mean?
                            
                                Regular expression must contain and may only contain
                            
                                Pythonic way to apply format to all strings in dictionary without f-strings
                            
                                How to support %x formatting on a class that emulates int
                            
                                Creating a column based on multiple conditions
                            
                                Why is it scipy.stats.gaussian_kde() slower than seaborn.kde_plot() for the same data?
                            
                                How To Parse Verbs Using Spacy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python and HyperOpt: How to make multi-process grid searching?

Tags:

python

pandas

machine-learning

grid-search

hyperparameters

user1367204

People also ask

2 Answers

richliaw

Gabriel Fernandez

Recent Activity

Donate For Us