Given a set of 2D points (black dots in the picture) I need to choose two lines to somehow represent those points. I probably need to minimize the sum of squares of [distance of x
to the closer of two lines]^2, although if any other metric makes this easier, this is fine too.
The obvious but ineffective approach is to try min squares over all 2^n partitions. A practical approach is probably iterative improvement, maybe starting with a random partition.
Is there any research on algorithms to handle this problem?
I think this can be formulated as an explicit optimization problem:
min sum(j, r1(j)^2 + r2(j)^2) (quadratic)
subject to
r1(j) = (y(j) - a0 - a1*x(j)) * δ(j) (quadratic)
r2(j) = (y(j) - b0 - b1*x(j)) * (1-δ(j)) (quadratic)
δ(j) ∈ {0,1}
We do the assignment of points to a line and the regression (minimization of the sum of the squared residuals) at the same time.
This is a non-convex quadratically constrained mixed-integer quadratic programming problem. Solvers that can handle this include Gurobi, Baron, Couenne, Antigone.
We can reformulate this a bit:
min sum(j, r(j)^2) (convex quadratic)
subject to
r(j) = y(j) - a0 - a1*x(j) + s1(j) (one of these will be relaxed)
r(j) = y(j) - b0 - b1*x(j) + s2(j) (all linear)
-U*δ(j) <= s1(j) <= U*δ(j)
-U*(1-δ(j)) <= s2(j) <= U*(1-δ(j))
δ(j) ∈ {0,1}
s1(j),s2(j) ∈ [-U,U]
U = 1000 (reasonable bound, constant)
This makes it a straight convex MIQP model. This will allow more solvers to be used (e.g. Cplex) and it is much easier to solve. Some other formulations are here. Some of the models mentioned do not need the bounds I had to use in the above big-M formulation. It is noted these models deliver proven optimal solutions (for the non-convex models this would require a global solver; the convex models are easier and don't need this). Furthermore, instead of a least squares objective, we can also form an L1 objective. In the latter case we end up with a linear MIP model.
A small test confirms this works:
This problem has 50 points, and needed 1.25 seconds using Cplex's MIQP solver on a slow laptop. This may be a case of a statistical problem where MIP/MIQP methods have something to offer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With