I have a simple machine learning question: I have n (~110) elements, and a matrix of all the pairwise distances. I would like to choose the 10 elements that are most far apart. That is, I want to <pre class="prettyprint"><code>Maximize: Choose 10 different elements. Return min distance over (all pairings within the 10). </code></pre> My distance metric is symmetric and respects the triangle inequality. What kind of algorithm can I use? My first instinct is to do the following: <ol> <li>Cluster the n elements into 20 clusters.</li> <li>Replace each cluster with just the element of that cluster that is furthest from the mean element of the original n.</li> <li>Use brute force to solve the problem on the remaining 20 candidates. Luckily, 20 choose 10 is only 184,756.</li> </ol> <hr> Edit: thanks to etarion's insightful comment, changed "Return sum of (distances)" to "Return min distance" in the optimization problem statement.

Here's how you might approach this combinatorial optimization problem by taking the convex relaxation. Let D be an upper triangular matrix with your distances on the upper triangle. I.e. where i < j, D_i,j is the distance between elements i and j. (Presumably, you'll have zeros on the diagonal, as well.) Then your objective is to maximize x'*D*x, where x is binary valued with 10 elements set to 1 and the rest to 0. (Setting the ith entry in x to 1 is analogous to selecting the ith element as one of your 10 elements.) The "standard" convex optimization thing to do with a combinatorial problem like this is to relax the constraints such that x need not be discrete valued. Doing so gives us the following problem: maximize y'*D*y subject to: 0 <= y_i <= 1 for all i, 1'*y = 10 This is (morally) a quadratic program. (If we replace D with D + D', it'll become a bona fide quadratic program and the y you get out should be no different.) You can use an off-the-shelf QP solver, or just plug it in to the convex optimization solver of your choice (e.g. cvx). The y you get out need not be (and probably won't be) a binary vector, but you can convert the scalar values to discrete ones in a bunch of ways. (The simplest is probably to let x be 1 in the 10 entries where y_i is highest, but you might need to do something a little more complicated.) In any case, y'*D*y with the y you get out does give you an upper bound for the optimal value of x'*D*x, so if the x you construct from y has x'*D*x very close to y'*D*y, you can be pretty happy with your approximation. Let me know if any of this is unclear, notation or otherwise.

Nice question. I'm not sure if it can be solved exactly in an efficient manner, and your clustering based solution seems reasonable. Another direction to look at would be local search method such as simulated annealing and hill climbing. Here's an obvious baseline I would compare any other solution against: <ol> <li>Repeat 100 times:</li> <li>Greedily select the datapoint that whose removal decreases the objective function the least and remove it.</li> </ol>

Most mutually distant k elements (clustering?)

Tags:

artificial-intelligence

machine-learning

combinations

cluster-analysis

k-means

I have a simple machine learning question:

I have n (~110) elements, and a matrix of all the pairwise distances. I would like to choose the 10 elements that are most far apart. That is, I want to

Maximize:
  Choose 10 different elements.
  Return min distance over (all pairings within the 10).

My distance metric is symmetric and respects the triangle inequality.

What kind of algorithm can I use? My first instinct is to do the following:

Cluster the n elements into 20 clusters.
Replace each cluster with just the element of that cluster that is furthest from the mean element of the original n.
Use brute force to solve the problem on the remaining 20 candidates. Luckily, 20 choose 10 is only 184,756.

Edit: thanks to etarion's insightful comment, changed "Return sum of (distances)" to "Return min distance" in the optimization problem statement.

437

asked Mar 23 '11 04:03

AlcubierreDrive

2 Answers

Here's how you might approach this combinatorial optimization problem by taking the convex relaxation.

Let D be an upper triangular matrix with your distances on the upper triangle. I.e. where i < j, D_i,j is the distance between elements i and j. (Presumably, you'll have zeros on the diagonal, as well.)

Then your objective is to maximize x'*D*x, where x is binary valued with 10 elements set to 1 and the rest to 0. (Setting the ith entry in x to 1 is analogous to selecting the ith element as one of your 10 elements.)

The "standard" convex optimization thing to do with a combinatorial problem like this is to relax the constraints such that x need not be discrete valued. Doing so gives us the following problem:

maximize y'*D*y subject to: 0 <= y_i <= 1 for all i, 1'*y = 10

This is (morally) a quadratic program. (If we replace D with D + D', it'll become a bona fide quadratic program and the y you get out should be no different.) You can use an off-the-shelf QP solver, or just plug it in to the convex optimization solver of your choice (e.g. cvx).

The y you get out need not be (and probably won't be) a binary vector, but you can convert the scalar values to discrete ones in a bunch of ways. (The simplest is probably to let x be 1 in the 10 entries where y_i is highest, but you might need to do something a little more complicated.) In any case, y'*D*y with the y you get out does give you an upper bound for the optimal value of x'*D*x, so if the x you construct from y has x'*D*x very close to y'*D*y, you can be pretty happy with your approximation.

Let me know if any of this is unclear, notation or otherwise.

200

answered Oct 19 '22 16:10

Dan Robinson

Nice question.

I'm not sure if it can be solved exactly in an efficient manner, and your clustering based solution seems reasonable. Another direction to look at would be local search method such as simulated annealing and hill climbing.

Here's an obvious baseline I would compare any other solution against:

Repeat 100 times:
Greedily select the datapoint that whose removal decreases the objective function the least and remove it.

answered Oct 19 '22 14:10

carlosdc

Related questions
                            
                                XOR problem solvable with 2x2x1 neural network without bias?
                            
                                Performance of an A* search implemented in Clojure
                            
                                Training neural network for XOR in Ruby
                            
                                Bug in Minimax Algorithm for Tic Tac Toe
                            
                                Finding the best move using MinMax with Alpha-Beta pruning
                            
                                Calculating Nearest Match to Mean/Stddev Pair With LibSVM
                            
                                Minimax algorithm: Cost/evaluation function?
                            
                                Concatting a list of strings in Prolog
                            
                                Cannibals and missionaries using IDDFS and GreedyBFS
                            
                                A Simple Chess Minimax
                            
                                Combining Weak Learners into a Strong Classifier
                            
                                Confusion in propositional logic algorithm
                            
                                Breadth first search branching factor
                            
                                Genetic Algorithm, large population vs small one
                            
                                Neural Network and Temporal Difference Learning
                            
                                Application of neural network for use with log file data
                            
                                Is best first search optimal and complete?
                            
                                How to go about creating a prolog program that can work backwards to determine steps needed to reach a goal
                            
                                Why do admissable heuristics guarantee optimality?
                            
                                Stochastic gradient Descent implementation - MATLAB

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With