Consider the set of non-decreasing surjective (onto) functions from (-inf,inf) to [0,1]. (Typical CDFs satisfy this property.) In other words, for any real number x, 0 <= f(x) <= 1. The logistic function is perhaps the most well-known example.
We are now given some constraints in the form of a list of x-values and for each x-value, a pair of y-values that the function must lie between. We can represent that as a list of {x,ymin,ymax} triples such as
constraints = {{0, 0, 0}, {1, 0.00311936, 0.00416369}, {2, 0.0847077, 0.109064},
{3, 0.272142, 0.354692}, {4, 0.53198, 0.646113}, {5, 0.623413, 0.743102},
{6, 0.744714, 0.905966}}
Graphically that looks like this:
(source: yootles.com)
We now seek a curve that respects those constraints. For example:
(source: yootles.com)
Let's first try a simple interpolation through the midpoints of the constraints:
mids = ({#1, Mean[{#2,#3}]}&) @@@ constraints
f = Interpolation[mids, InterpolationOrder->0]
Plotted, f looks like this:
(source: yootles.com)
That function is not surjective. Also, we'd like it to be smoother. We can increase the interpolation order but now it violates the constraint that its range is [0,1]:
(source: yootles.com)
The goal, then, is to find the smoothest function that satisfies the constraints:
The first example I plotted above seems to be a good candidate but I did that with Mathematica's FindFit function assuming a lognormal CDF. That works well in this specific example but in general there need not be a lognormal CDF that satisfies the constraints.
To determine the best fit, you should examine both the graphical and numerical fit results. Determine the best fit by examining the graphs of the fits and residuals. The graphical fit results indicate that: The fits and residuals for the polynomial equations are all similar, making it difficult to choose the best one.
Curve fitting is one of the most powerful and most widely used analysis tools in Origin. Curve fitting examines the relationship between one or more predictors (independent variables) and a response variable (dependent variable), with the goal of defining a "best fit" model of the relationship.
Performing Curve Fitting in Python. Curve Fitting can be performed for the dataset using Python. Python provides an open-source library known as the SciPy package. This SciPy package involves a function known as the curve_fit() function used to curve fit through Non-Linear Least Squares.
I don't think you've specified enough criteria to make the desired CDF unique.
If the only criteria that must hold is:
then perhaps you could use Monotone Cubic Interpolation. This will give you a C^2 (twice continously differentiable) function which, unlike cubic splines, is guaranteed to be monotone when given monotone data.
This leaves open the question, exactly what data should you use to generate the monotone cubic interpolation. If you take the center point (mean) of each error bar, are you guaranteed that the resulting data points are monotonically increasing? If not, you might as well make some arbitrary choice to guarantee that the points you select are monotonically increasing (because the criteria does not force our solution to be unique).
Now what to do about the last data point? Is there an X which is guaranteed to be larger than any x in the constraints data set? Perhaps you can again make an arbitrary choice of convenience and pick some very large X and put (X,1) as the final data point.
Comment 1: Your problem can be broken into 2 sub-problems:
Comment 2: Here is a way to use monotonic cubic interpolation, and satisfy criteria 4 and 5:
The monotonic cubic interpolation (let's call it f
) maps R --> R.
Let CDF(x) = exp(-exp(f(x)))
. Then CDF: R --> (0,1)
. If we could find the appropriate f
, then by defining CDF
this way, we could satisfy criteria 4 and 5.
To find f
, transform the CDF constraints (x_0,y_0),...,(x_n,y_n)
using the transformation xhat_i = x_i
, yhat_i = log(-log(y_i))
. This is the inverse of the CDF
transformation. If the y_i
's were increasing, then the yhat_i
's are decreasing.
Now apply monotone cubic interpolation to the (x_hat,y_hat) data points to generate f
. Then finally, define CDF(x) = exp(-exp(f(x)))
. This will be a monotonically increasing function from R --> (0,1), which passes through the points (x_i,y_i).
This, I think, satisfies all the criteria 2--5. Criteria 1 is somewhat satisfied, though there certainly could exist smoother solutions.
I have found a solution that gives reasonable results for a variety of inputs. I start by fitting a model -- once to the low ends of the constraints, and again to the high ends. I'll refer to the mean of these two fitted functions as the "ideal function". I use this ideal function to extrapolate to the left and to the right of where the constraints end, as well as to interpolate between any gaps in the constraints. I compute values for the ideal function at regular intervals, including all the constraints, from where the function is nearly zero on the left to where it's nearly one on the right. At the constraints, I clip these values as necessary to satisfy the constraints. Finally, I construct an interpolating function that goes through these values.
My Mathematica implementation follows.
First, a couple helper functions:
(* Distance from x to the nearest member of list l. *)
listdist[x_, l_List] := Min[Abs[x - #] & /@ l]
(* Return a value x for the variable var such that expr/.var->x is at least (or
at most, if dir is -1) t. *)
invertish[expr_, var_, t_, dir_:1] := Module[{x = dir},
While[dir*(expr /. var -> x) < dir*t, x *= 2];
x]
And here's the main function:
(* Return a non-decreasing interpolating function that maps from the
reals to [0,1] and that is as close as possible to expr[var] without
violating the given constraints (a list of {x,ymin,ymax} triples).
The model, expr, will have free parameters, params, so first do a
model fit to choose the parameters to satisfy the constraints as well
as possible. *)
cfit[constraints_, expr_, params_, var_] :=
Block[{xlist,bots,tops,loparams,hiparams,lofit,hifit,xmin,xmax,gap,aug,bests},
xlist = First /@ constraints;
bots = Most /@ constraints; (* bottom points of the constraints *)
tops = constraints /. {x_, _, ymax_} -> {x, ymax};
(* fit a model to the lower bounds of the constraints, and
to the upper bounds *)
loparams = FindFit[bots, expr, params, var];
hiparams = FindFit[tops, expr, params, var];
lofit[z_] = (expr /. loparams /. var -> z);
hifit[z_] = (expr /. hiparams /. var -> z);
(* find x-values where the fitted function is very close to 0 and to 1 *)
{xmin, xmax} = {
Min@Append[xlist, invertish[expr /. hiparams, var, 10^-6, -1]],
Max@Append[xlist, invertish[expr /. loparams, var, 1-10^-6]]};
(* the smallest gap between x-values in constraints *)
gap = Min[(#2 - #1 &) @@@ Partition[Sort[xlist], 2, 1]];
(* augment the constraints to fill in any gaps and extrapolate so there are
constraints everywhere from where the function is almost 0 to where it's
almost 1 *)
aug = SortBy[Join[constraints, Select[Table[{x, lofit[x], hifit[x]},
{x, xmin,xmax, gap}],
listdist[#[[1]],xlist]>gap&]], First];
(* pick a y-value from each constraint that is as close as possible to
the mean of lofit and hifit *)
bests = ({#1, Clip[(lofit[#1] + hifit[#1])/2, {#2, #3}]} &) @@@ aug;
Interpolation[bests, InterpolationOrder -> 3]]
For example, we can fit to a lognormal, normal, or logistic function:
g1 = cfit[constraints, CDF[LogNormalDistribution[mu,sigma], z], {mu,sigma}, z]
g2 = cfit[constraints, CDF[NormalDistribution[mu,sigma], z], {mu,sigma}, z]
g3 = cfit[constraints, 1/(1 + c*Exp[-k*z]), {c,k}, z]
Here's what those look like for my original list of example constraints:
(source: yootles.com)
The normal and logistic are nearly on top of each other and the lognormal is the blue curve.
These are not quite perfect. In particular, they aren't quite monotone. Here's a plot of the derivatives:
Plot[{g1'[x], g2'[x], g3'[x]}, {x, 0, 10}]
(source: yootles.com)
That reveals some lack of smoothness as well as the slight non-monotonicity near zero. I welcome improvements on this solution!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With