I have a set of numbers to compare. Let's say that I have to get this comparison from a user. I have a choice of either asking him a question consisting of 2 numbers or 3 numbers of 4 numbers. For instance, I can ask any of the following questions:
My goal is to minimize the number of questions I ask the user in some combination that will ultimately give me an ordering of all the numbers in the set... For instance, if there were just 4 numbers {2,3,4,5}, I could've asked him the third type of question where I give him 4 numbers to compare. But in the ecosystem I am designing this for, the user gets annoyed with long questions so I would want to minimize the number of questions of this type that I would ask. So if each of the questions has a particular weight, I a trying to figure out a way to ultimately get the ordering of all numbers but at the same time pose the user with minimum trouble.
Is there a good way of solving this problem? Does anyone see this as belonging to a general class of problems or am I just making it too complex? Any suggestions?
Let's experiment, would you.
1. Example
Let's look at {A,B,C,D}
and sort it.
Solution 1: By sets
{A,B,C,D}
-> B
(thus B>A
, B>C
and B>D
){A,C,D}
-> D
(thus D>A
and D>C
){A,C}
-> A
(thus A>C
)Total order
[B,D,A,C]
Solution 2: By pairs
A
and C
-> A
(thus A>C
)B
and D
-> B
(thus B>D
)D
and A
-> D
Total order
[B,D,A,C]
What's the catch ? Obviously it's more difficult by pairs, here I chose them so that the merge would be easy (none).
2. Remarks
a) Total ordering
The >
only work so well with a total ordering: ie for two given elements A
and B
of a set either A>B
or B>A
. If none of those two relations hold, then A
and B
are equivalent.
The problem with the Solution 1 approach is that if you present the user with {A,B,C,D}
and A
and B
are equivalent and greater than C
and D
... what's the answer supposed to be ?
b) Transitivity
The >
relationship is transitive, meaning that if A>B
and B>C
then A>C
. It's important to use this fact since you can therefore deduce relationships between two elements without ever asking the user.
c) What's the goal ?
Is the goal supposed to minimize the number of questions to the user or supposed to minimize the user's work ? Because obviously it's more difficult for a user to answer questions from the first solution...
3. Modeling
One can model the problem as a "graph" problem.
You begin with a set of nodes {A,B,C,D}
which represent the values you want to test.
Sorting the set is equivalent to computing the minimum set of oriented edges that link these nodes so that given any two nodes, a path leads from one to the other. I do insist of minimum.
For example, if I have 2 edges: B>A
and B>C
, then if I discover than A>C
I shall remove B>C
because I can deduce it by transitivity. The important properties is that if no two nodes are equivalent, the cardinal of the resulting set of edges is the cardinal of the set of nodes minus 1. In my example (given 4 nodes) it would be 3.
An oracle (or an extremely lucky guy) would thus be able to only ask 3 questions, and yet build this graph... that's what we should strive for.
4. How to solve this problem ?
Okay, let's suppose that we have no 2 equivalent elements. This means that if A>B
is false, then I can deduce that B>A
...
For the representation of our little graph, let's take an array:
A B C D
D . > . # . represent the unknown
C . >
B > # < and > have their usual meaning...
A
Because of the symmetry we only need a triangle.
By counting the number .
we can see the number of unknown relationships, the ideal solution is to get rid of all of those .
while asking as few questions as possible to the user.
A good question is therefore one than remove as much .
as possible in one swoop.
5. My question
And thus, I have another type of question:
Select the elements lower than "D" in the following {A,B,C}: _
This question feels better that the What is the greater... because I explicitly target those relationships I wish to know (D?A
, D?B
and D?C
) while the greater guarantees that I will obtain as much relationships but I can't know which in advance.
I try to maximize the usefulness of the question
Of course, it's no leeway for laziness: it's still important to take transitivity into account while exploiting the results of the questions.
6. Exploring larger sets
With larger sets, you can group the elements, and then merge them later on, but the merge is always a messy operation: you will probably get answers you already knew. However it's a great practice for my little question:
Given 2 ordered (disjoint) sets: [A,B,C,D,...]
and [R,S,T,U,...]
let's see the 3 questions of the toolbox:
A
or R
? _{A,B,R,S}
? _Which elements are greater than A
in {R,S,T}
? _
Gives 1 relationship
The third question asks for a more elaborate answer, but it is much more suitable for merge situations. In the merge case, it's as efficient as asking to sort the elements, since it asks precisely for the only 3 relationships we don't know in the board.
7. Conclusion
You now have 4 questions in your box:
(Pair can be regarded as a degenerated case of either Greater or Pivot)
Given that each question gives n-1
relationships for n
nodes, you could try to gauge how much time it takes for a user to answer a question T(n)
and then find the n
that maximize T(n)/n
;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With