Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fitting two lines to a set of 2D points

Given a set of 2D points (black dots in the picture) I need to choose two lines to somehow represent those points. I probably need to minimize the sum of squares of [distance of x to the closer of two lines]^2, although if any other metric makes this easier, this is fine too.

The obvious but ineffective approach is to try min squares over all 2^n partitions. A practical approach is probably iterative improvement, maybe starting with a random partition.

Is there any research on algorithms to handle this problem?

two lines

like image 836
Rafał Dowgird Avatar asked Feb 24 '21 10:02

Rafał Dowgird


1 Answers

I think this can be formulated as an explicit optimization problem:

 min sum(j, r1(j)^2 + r2(j)^2)                 (quadratic)
 subject to
     r1(j) = (y(j) - a0 - a1*x(j)) * δ(j)      (quadratic)
     r2(j) = (y(j) - b0 - b1*x(j)) * (1-δ(j))  (quadratic) 
     δ(j) ∈ {0,1}

We do the assignment of points to a line and the regression (minimization of the sum of the squared residuals) at the same time.

This is a non-convex quadratically constrained mixed-integer quadratic programming problem. Solvers that can handle this include Gurobi, Baron, Couenne, Antigone.

We can reformulate this a bit:

 min sum(j, r(j)^2)                      (convex quadratic)
 subject to
     r(j) = y(j) - a0 - a1*x(j) + s1(j)  (one of these will be relaxed)
     r(j) = y(j) - b0 - b1*x(j) + s2(j)  (all linear)
     -U*δ(j) <= s1(j) <= U*δ(j)
     -U*(1-δ(j)) <= s2(j) <= U*(1-δ(j))
     δ(j) ∈ {0,1}
     s1(j),s2(j) ∈ [-U,U]
     U = 1000                            (reasonable bound, constant)

This makes it a straight convex MIQP model. This will allow more solvers to be used (e.g. Cplex) and it is much easier to solve. Some other formulations are here. Some of the models mentioned do not need the bounds I had to use in the above big-M formulation. It is noted these models deliver proven optimal solutions (for the non-convex models this would require a global solver; the convex models are easier and don't need this). Furthermore, instead of a least squares objective, we can also form an L1 objective. In the latter case we end up with a linear MIP model.

A small test confirms this works:

enter image description here

This problem has 50 points, and needed 1.25 seconds using Cplex's MIQP solver on a slow laptop. This may be a case of a statistical problem where MIP/MIQP methods have something to offer.

like image 62
Erwin Kalvelagen Avatar answered Sep 22 '22 06:09

Erwin Kalvelagen