Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm: Is there a good way of solving a comparison?

I have a set of numbers to compare. Let's say that I have to get this comparison from a user. I have a choice of either asking him a question consisting of 2 numbers or 3 numbers of 4 numbers. For instance, I can ask any of the following questions:

  • Which of the numbers is greater? 2 OR 4
  • Which of the numbers is greater? 2 OR 3 OR 4
  • Which of the numbers if greater? 2 OR 3 OR 4 OR 5

My goal is to minimize the number of questions I ask the user in some combination that will ultimately give me an ordering of all the numbers in the set... For instance, if there were just 4 numbers {2,3,4,5}, I could've asked him the third type of question where I give him 4 numbers to compare. But in the ecosystem I am designing this for, the user gets annoyed with long questions so I would want to minimize the number of questions of this type that I would ask. So if each of the questions has a particular weight, I a trying to figure out a way to ultimately get the ordering of all numbers but at the same time pose the user with minimum trouble.

Is there a good way of solving this problem? Does anyone see this as belonging to a general class of problems or am I just making it too complex? Any suggestions?

like image 588
Legend Avatar asked Feb 27 '23 13:02

Legend


1 Answers

Let's experiment, would you.

1. Example

Let's look at {A,B,C,D} and sort it.

Solution 1: By sets

  • Greater of {A,B,C,D} -> B (thus B>A, B>C and B>D)
  • Greater of {A,C,D} -> D (thus D>A and D>C)
  • Greater of {A,C} -> A (thus A>C)

Total order [B,D,A,C]

Solution 2: By pairs

  • Greater between A and C -> A (thus A>C)
  • Greater between B and D -> B (thus B>D)
  • Greater between D and A -> D

Total order [B,D,A,C]

What's the catch ? Obviously it's more difficult by pairs, here I chose them so that the merge would be easy (none).

2. Remarks

a) Total ordering

The > only work so well with a total ordering: ie for two given elements A and B of a set either A>B or B>A. If none of those two relations hold, then A and B are equivalent.

The problem with the Solution 1 approach is that if you present the user with {A,B,C,D} and A and B are equivalent and greater than C and D... what's the answer supposed to be ?

b) Transitivity

The > relationship is transitive, meaning that if A>B and B>C then A>C. It's important to use this fact since you can therefore deduce relationships between two elements without ever asking the user.

c) What's the goal ?

Is the goal supposed to minimize the number of questions to the user or supposed to minimize the user's work ? Because obviously it's more difficult for a user to answer questions from the first solution...

3. Modeling

One can model the problem as a "graph" problem.

You begin with a set of nodes {A,B,C,D} which represent the values you want to test.

Sorting the set is equivalent to computing the minimum set of oriented edges that link these nodes so that given any two nodes, a path leads from one to the other. I do insist of minimum.

For example, if I have 2 edges: B>A and B>C, then if I discover than A>C I shall remove B>C because I can deduce it by transitivity. The important properties is that if no two nodes are equivalent, the cardinal of the resulting set of edges is the cardinal of the set of nodes minus 1. In my example (given 4 nodes) it would be 3.

An oracle (or an extremely lucky guy) would thus be able to only ask 3 questions, and yet build this graph... that's what we should strive for.

4. How to solve this problem ?

Okay, let's suppose that we have no 2 equivalent elements. This means that if A>B is false, then I can deduce that B>A...

For the representation of our little graph, let's take an array:

  A B C D  
D . > .      # . represent the unknown
C . >        
B >          # < and > have their usual meaning...
A 

Because of the symmetry we only need a triangle.

By counting the number . we can see the number of unknown relationships, the ideal solution is to get rid of all of those . while asking as few questions as possible to the user.

A good question is therefore one than remove as much . as possible in one swoop.

5. My question

And thus, I have another type of question:

Select the elements lower than "D" in the following {A,B,C}: _

This question feels better that the What is the greater... because I explicitly target those relationships I wish to know (D?A, D?B and D?C) while the greater guarantees that I will obtain as much relationships but I can't know which in advance.

I try to maximize the usefulness of the question

Of course, it's no leeway for laziness: it's still important to take transitivity into account while exploiting the results of the questions.

6. Exploring larger sets

With larger sets, you can group the elements, and then merge them later on, but the merge is always a messy operation: you will probably get answers you already knew. However it's a great practice for my little question:

Given 2 ordered (disjoint) sets: [A,B,C,D,...] and [R,S,T,U,...] let's see the 3 questions of the toolbox:

  1. Which is greater: A or R ? _
  2. Which is the greatest element of {A,B,R,S} ? _
  3. Which elements are greater than A in {R,S,T} ? _

  4. Gives 1 relationship

  5. Gives 2 relationships: 3 of which 1 is already known
  6. Gives 3 relationships

The third question asks for a more elaborate answer, but it is much more suitable for merge situations. In the merge case, it's as efficient as asking to sort the elements, since it asks precisely for the only 3 relationships we don't know in the board.

7. Conclusion

You now have 4 questions in your box:

  • Sort: Sort the following elements from greater to lower {A,B,C,D} ? _
  • Pivot: Which elements are greater than A in this set {B,C,D} ? _
  • Greater: Which element is the greater in this set {A,B,C,D} ? _
  • Pair: Which is the greater element among A and B ? _

(Pair can be regarded as a degenerated case of either Greater or Pivot)

Given that each question gives n-1 relationships for n nodes, you could try to gauge how much time it takes for a user to answer a question T(n) and then find the n that maximize T(n)/n ;)

like image 132
Matthieu M. Avatar answered Mar 08 '23 07:03

Matthieu M.