I have a function that has a bunch of parameters. Rather than setting all of the parameters manually, I want to perform a grid search. I have a list of possible values for each parameter. For every possible combination of parameters, I want to run my function which reports the performance of my algorithm on those parameters. I want to store the results of this in a many-dimensional matrix, so that afterwords I can just find the index of the maximum performance, which would in turn give me the best parameters. Here is how the code is written now:
param1_list = [p11, p12, p13,...]
param2_list = [p21, p22, p23,...] # not necessarily the same number of values
...
results_size = (len(param1_list), len(param2_list),...)
results = np.zeros(results_size, dtype = np.float)
for param1_idx in range(len(param1_list)):
for param2_idx in range(len(param2_list)):
...
param1 = param1_list[param1_idx]
param2 = param2_list[param2_idx]
...
results[param1_idx, param2_idx, ...] = my_func(param1, param2, ...)
max_index = np.argmax(results) # indices of best parameters!
I want to keep the first part, where I define the lists as-is, since I want to easily be able to manipulate the values over which I search.
I also want to end up with the results matrix as is, since I will be visualizing how changing different parameters affects the performance of the algorithm.
The bit in the middle, though, is quite repetitive and bulky (especially because I have lots of parameters, and I might want to add or remove parameters), and I feel like there should be a more succinct/elegant way to initialize the results matrix, iterate over all of the indices, and set the appropriate parameters.
So, is there?
You can use the ParameterGrid from the sklearn module
http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.ParameterGrid.html
Example
from sklearn.grid_search import ParameterGrid
param_grid = {'param1': [value1, value2, value3], 'paramN' : [value1, value2, valueM]}
grid = ParameterGrid(param_grid)
for params in grid:
your_function(params['param1'], params['param2'])
I think scipy.optimize.brute
is what you're after.
>>> from scipy.optimize import brute
>>> a,f,g,j = brute(my_func,[param1_list,param2_list,...],full_output = True)
Note that if the full_output
argument is True
, the evaluation grid will be returned.
The solutions from John Vinyard and Sibelius Seraphini are good built-in options, but but if you're looking for more flexibility, you could use broadcasting + vectorize
. Use ix_
to produce a broadcastable set of parameters, and then pass those to a vectorized version of the function (but see caveat below):
a, b, c = range(3), range(3), range(3)
def my_func(x, y, z):
return (x + y + z) / 3.0, x * y * z, max(x, y, z)
grids = numpy.vectorize(my_func)(*numpy.ix_(a, b, c))
mean_grid, product_grid, max_grid = grids
With the following results for mean_grid
:
array([[[ 0. , 0.33333333, 0.66666667],
[ 0.33333333, 0.66666667, 1. ],
[ 0.66666667, 1. , 1.33333333]],
[[ 0.33333333, 0.66666667, 1. ],
[ 0.66666667, 1. , 1.33333333],
[ 1. , 1.33333333, 1.66666667]],
[[ 0.66666667, 1. , 1.33333333],
[ 1. , 1.33333333, 1.66666667],
[ 1.33333333, 1.66666667, 2. ]]])
product grid
:
array([[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 1, 2],
[0, 2, 4]],
[[0, 0, 0],
[0, 2, 4],
[0, 4, 8]]])
and max grid
:
array([[[0, 1, 2],
[1, 1, 2],
[2, 2, 2]],
[[1, 1, 2],
[1, 1, 2],
[2, 2, 2]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]]])
Note that this may not be the fastest approach. vectorize
is handy, but it's limited by the speed of the function passed to it, and python functions are slow. If you could rewrite my_func
to use numpy ufuncs, you could get your grids faster, if you cared to. Something like this:
>>> def mean(a, b, c):
... return (a + b + c) / 3.0
...
>>> mean(*numpy.ix_(a, b, c))
array([[[ 0. , 0.33333333, 0.66666667],
[ 0.33333333, 0.66666667, 1. ],
[ 0.66666667, 1. , 1.33333333]],
[[ 0.33333333, 0.66666667, 1. ],
[ 0.66666667, 1. , 1.33333333],
[ 1. , 1.33333333, 1.66666667]],
[[ 0.66666667, 1. , 1.33333333],
[ 1. , 1.33333333, 1.66666667],
[ 1.33333333, 1.66666667, 2. ]]])
You may use numpy meshgrid
for this:
import numpy as np
x = range(1, 5)
y = range(10)
xx, yy = np.meshgrid(x, y)
results = my_func(xx, yy)
note that your function must be able to work with numpy.array
s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With