Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding most distinct elements in a set

Say we have a perfume shop that has 100 different perfumes.

Let's say 10,000 customers come in an rate each perfume one through five stars.

Let's say the question is: "how to best construct a pack of 5 perfumes so that 95% customers will give a 4+ star rating for at least one of them"

How to do this algorithmically?

NOTE: I can see that even the question isn't properly formed; there's no guarantee that such a construction even exists. There is a trade-off between 2 parameters.

NOTE: Also, (and this makes the perfume analogy becomes slightly artificial), it doesn't matter whether we get one good match or three good matches. So {4.3, 0, 0, 0, 0} would be equivalent to {4.3, 4.2, 4.2, 4.2, 4.2} -- in both cases the score is 4.3.

Let's say for the purpose of argument that perfumes 0-19 are sweet, perfumes 20-39 are sour, etc (sim. salt, bitter, unami)

So there would be very high crosscorrelation between 0-19.

If you modelled this with 100 points in space, then 0-19 would all attract each other very strongly, they would form a cluster.

Similarly you would get 4 other clusters for the other four tastes.

So from just one metric, we have separated out 5 distinct flavours.

But does this technique extend?

π

PS just giving the names of related techniques would be very helpful, as this would allow me to Google for further information. So any answer that just restates the question in industry accepted terminology would be useful!

like image 213
P i Avatar asked Oct 20 '22 22:10

P i


1 Answers

This algorithm should find a solution to the problem:

  1. Order the perfumes by the number of customers giving a 4+ rating
  2. Choose the first perfume not concidered yet from the list
  3. Delete the ratings from the customers now satisfied.
  4. Repeat the process for perfumes 2 - 5 in the pack.

Backtrace when neccessary to obtain a selection satisfying the criterion.

like image 131
Terje D. Avatar answered Oct 27 '22 09:10

Terje D.