I'm trying to fit a monotonic curve to some nearly-monotonic data. (The X values are monotonic, and the Y values should be monotonic but the noise is often bigger than the change in the underlying value from point to point.) Here's a summary of what I'm doing at the moment:
def goodness_of_fit(Xfit):
assert(is_sorted(Xfit))
# ( Calculate the area between the fit line and the join-the-dots line from the data )
scipy.optimize.minimize(goodness_of_fit, x0=numpy.linspace(xmin, xmax))
I can't find a way to get the optimisation algorithm to keep the Xfit array sorted - does anyone have any suggestions? (The size of the array is too large for it to be feasible to create N-1 individual ordering constraints and use the constrained optimisation functions.) I'm willing to use a different language other than Python if the best solution is only available in that language.
(N.B. I am indeed fitting the X values, not the Y values - this is because I eventually want to plot the dX/dY curve and have it not blow up to ridiculous values as it does if I plot it from the raw data. However, if it's much easier to fit the Y values on fixed X values I could do that instead.)
How about creating a new array from Xfit that is a strictly-monotonic subset and then fitting the curve to that? Something like:
Xfit = np.hstack(((-1,-1,0,1,1),np.arange(1,10),(-7,9,10,10)))
len_mono = 0
Xfit_mono = zeros(Xfit.size)
Xfit_mono_ind = zeros(Xfit.size)
Xfit_mono[len_mono] = Xfit[0]
for(iX, x) in enumerate(Xfit):
if iX > 0:
if x > Xfit_mono[len_mono]:
len_mono = len_mono + 1
Xfit_mono[len_mono] = x
Xfit_mono_ind[len_mono] = iX
Xfit_mono_ind = Xfit_mono_ind[:len_mono+1]
Xfit_mono = Xfit_mono[:len_mono+1]
print(Xfit_mono_ind)
print(Xfit_mono)
Then you can use Xfit_mono in your curve fitting instead, and when you want to select the associated y values, you can use the values of Xfit_mono_ind as the indices of y.
Update replaces:
if x > Xfit[iX-1]:
with:
if x > Xfit_mono[len_mono]:
For weakly-increasing the old solution works. However, the old result gave the wrong answer if it is ever decreasing. This update gives the desired result in both cases .
In the end I did it by fitting the differences between successive X values rather than the values themselves - I can then use a simple lower bound of zero.
def fg(X):
# return the objective function and its Jacobian
def adjusted_fg(X_diff):
X = cumsum(X_diff)
score, jac = fg(X)
jac[1:] = np.diff(jac)
return score, jac
X0_diff[1:] = [X0[0]] + np.diff(X0)
bounds = [(None, None)] + [(0, None) for i in range(len(X0)-1)]
scipy.optimize.minimize(adjusted_fg, X0_diff, method='L-BFGS-B', bounds=bounds)
For large N, this tends to be unstable and/or get stuck in a local minimum if not started close to the solution, so I'm going to try fitting for a smaller N (e.g. N/10) first, then interpolating to get a better X0 for the large N fit.
(N.B. For my problem I actually want strict ordering, so I used a positive lower bound rather than 0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With