Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for selecting n vectors out of a set while minimizing cost

assuming we have:

  • set U of n-dimensional vectors (vector v = < x1,x2 ... ,xn >)
  • constraint n-dimensional vector c = < x1...xn >
  • n-dimensional vector of weights w = < x1...xn >
  • integer S

i need algorithm that would select S vectors from U into set R while minimizing function cost(R)

cost(R) = sum(abs(c-sumVectors(R))*w)

(sumVectors is a function that sums all vectors like so: sumVectors({< 1,2 >; < 3 ,4>}) = < 4,6 > while sum(< 1, 2, 3 >) returns scalar 6)

The solution does not have to be optimal. I just need to get a best guess i can get in preset time.

Any idea where to start? (Preferably something faster/smarter than genetic algorithms)

like image 908
Arg Avatar asked Feb 19 '23 19:02

Arg


2 Answers

This is an optimization problem. Since you don't need the optimal solution, you can try the stochastic optimization method, e.g., Hill Climbing, in which you start with a random solution (a random subset of R) and look at the set of neighboring solutions (adding or removing one of the components of current solution) for those that are better with respective of the cost function.

To get better solution, you can also add Simulated Annealing to your hill climbing search. The idea is that in some cases, it's necessary to move to a worse solution and then arrive at a better one later. Simulated Annealing works better because it allows a move for a worse solution near the beginning of the process. The algorithm becomes less likely to allow a worse solution as the process goes on.

I paste some sample hill climbing python code to solve your problem here: https://gist.github.com/921f398d61ad351ac3d6

In my sample code, R always holds a list of the index into U, and I use euclidean distance to compare the similarity between neighbors. Certainly you can use other distance functions that satisfy your own needs. Also note in the code, I am getting neighbors on the fly. If you have a large pool of vectors in U, you might want to cache the pre-computed neighbors or even consider locality sensitive hashing to avoid O(n^2) comparison. Simulated Annealing can be added onto the above code.

The result of one random run is shown below.

I use only 20 vectors in U, S=10, so that I can compare the result with an optimal solution. The hill climbing process stops at the 4th step when there is no better choice to move to with replacing only one k-nearest-neighbors.

I also run with an exhaustive search which iterates all possible combinations. You can see that the hill-climbing result is pretty good compared with the exhaustive approach. It takes only 4 steps to get the relatively small cost (a local minimum though) which takes the exhaustive search more than 82K steps to beat it.

initial R [1, 3, 4, 5, 6, 11, 13, 14, 15, 17]
hill-climbing cost at step      1: 91784
hill-climbing cost at step      2: 89574
hill-climbing cost at step      3: 88664
hill-climbing cost at step      4: 88503
exhaustive search cost at step      1: 94165
exhaustive search cost at step      2: 93888
exhaustive search cost at step      4: 93656
exhaustive search cost at step      5: 93274
exhaustive search cost at step     10: 92318
exhaustive search cost at step     44: 92089
exhaustive search cost at step     50: 91707
exhaustive search cost at step     84: 91561
exhaustive search cost at step     99: 91329
exhaustive search cost at step    105: 90947
exhaustive search cost at step    235: 90718
exhaustive search cost at step    255: 90357
exhaustive search cost at step   8657: 90271
exhaustive search cost at step   8691: 90129
exhaustive search cost at step   8694: 90048
exhaustive search cost at step  19637: 90021
exhaustive search cost at step  19733: 89854
exhaustive search cost at step  19782: 89622
exhaustive search cost at step  19802: 89261
exhaustive search cost at step  20097: 89032
exhaustive search cost at step  20131: 88890
exhaustive search cost at step  20134: 88809
exhaustive search cost at step  32122: 88804
exhaustive search cost at step  32125: 88723
exhaustive search cost at step  32156: 88581
exhaustive search cost at step  69336: 88506
exhaustive search cost at step  82628: 88420
like image 118
greeness Avatar answered Feb 21 '23 10:02

greeness


You're going to need to check the costs all possible sets R and minimise. If you choose vectors in a stepwise fashion minimsing cost at each addition, you may not find the set with minimum cost. If the set U of vectors is very very large and computation is too slow you may be forced to use a stepwise method.

like image 36
Olivia Grigg Avatar answered Feb 21 '23 10:02

Olivia Grigg