Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scipy.optimize.leastsq calls objective function with NaN

I am using scipy.optimize.leastsq to attempt to fit a number of parameters to real-world data in the presence of noise. The objective function occasionally gets called with NaNs from within minpack. Is this the expected behavior of scipy.optimize.leastsq? Is there any better option than just returning NaN residuals under this condition?

The following code demonstrates the behavior:

import scipy.optimize
import numpy as np

xF = np.array([1.0, 2.0, 3.0, 4.0]) # Target value for fit
NOISE_LEVEL = 1e-6 # The random noise level
RETURN_LEN = 1000  # The objective function return vector length

def func(x):
    if np.isnan(np.sum(x)):
        raise ValueError('Invalid x: %s' % x)
    v = np.random.rand(RETURN_LEN) * NOISE_LEVEL
    v[:len(x)] += xF - x
    return v

iteration = 0
while (1):
    iteration += 1
    x = np.zeros(len(xF))
    y, cov = scipy.optimize.leastsq(func, x)
    print('%04d %s' % (iteration, y))

The Jacobian is being computed numerically. In the production code, the optimization normally works except for when the starting estimate is too good, the surface is exceedingly flat, and noise overwhelms the deltas used to numerically compute the Jacobian. In this case, the residuals of the objective function appear as random noise like the above code example, and I would not expect the optimization to converge.

In this code example, small NOISE_LEVEL values (<1e-10) always converge. At 1e-6, the ValueError is thrown typically within a few hundred attempts.

One possible workaround is to return a highly penalized residual (either NaN or INF), such as:

v = np.empty(RETURN_LEN)
v.fill(np.nan)
return v

This workaround seems to be effective if dirty. Any better alternatives or ways to prevent NaNs in the first place?

This behavior was observed under Python 2.7.9 x32 running on Windows 7.

like image 663
Matt Liberty Avatar asked Mar 31 '15 17:03

Matt Liberty


People also ask

What does SciPy optimize Leastsq do?

leastsq. Minimize the sum of squares of a set of equations. Should take at least one (possibly length N vector) argument and returns M floating point numbers.

Is SciPy optimize multithreaded?

NumPy/SciPy's functions are usually optimized for multithreading. Did you look at your CPU utilization to confirm that only one core is being used while the simulation is being ran? Otherwise you have nothing to gain from running multiple instances.

Is SciPy minimize deterministic?

The answer is yes.


1 Answers

Since your problem definition uses a linear solver on a non-linear problem (noise), the solver will freak out, unless the noise level is below the threshold of noticeability.

In order to solve this problem, you could try using a nonlinear solver. For instance using the broyden1 solving algorithm instead of leastsq:

import scipy.optimize
import numpy as np

xF = np.array([1.0, 2.0, 3.0, 4.0]) # Target value for fit
NOISE_LEVEL = 1.e-6 # The random noise level
RETURN_LEN = 1000  # The objective function return vector length

def func(x):
    if np.isnan(np.sum(x)):
        raise ValueError('Invalid x: %s' % x)
    v = np.random.rand(RETURN_LEN) * NOISE_LEVEL

    v[:len(x)] += xF - x

    return v[:len(x)]

iteration = 0
while iteration < 10:
    iteration += 1
    x = np.random.rand(len(xF))
    y = scipy.optimize.broyden1(func, x)
    print('%04d %s' % (iteration, y))

returns as output:

0001 [ 1.00000092  2.00000068  3.00000051  4.00000097]
0002 [ 1.0000012   2.00000214  3.00000272  4.00000369]
0003 [ 0.99999991  1.99999931  2.99999815  3.9999978 ]
0004 [ 1.00000097  2.00000198  3.00000345  4.00000425]
0005 [ 1.00000047  1.99999983  2.99999938  3.99999922]
0006 [ 1.00000024  2.00000021  3.00000071  4.00000136]
0007 [ 1.00000116  2.00000102  3.00000225  4.00000357]
0008 [ 1.00000006  2.00000002  3.00000017  4.00000039]
0009 [ 1.0000002   2.00000034  3.00000062  4.00000051]
0010 [ 1.00000137  2.0000015   3.00000193  4.00000344]
like image 90
Uvar Avatar answered Oct 15 '22 03:10

Uvar