Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collaborative sorting algorithm based on 1 vs 1 choice

I don't know if this is a more mathematical object, but I lurked mathexchange and doesn't look algorithm oriented so I prefer to ask here.

I would like to know if the following problem, was already resolved:

let's say we have 10 objects and that we want to sort them preferences based. If the sort pertains a single person, no problem, we ask him to answer to our questions (using bubblesort or similar) and answering, after a bunch of questions, he will receive the final ranking.

Now let's say that there are 10 persons. And we want to make a global rank. It becomes difficult, and anyone can have its way to solve the problem (for example, asking for the "first favourite three" to everyone and assigning points, then make a ranking);

I would like to be more scientific and therefore more algorithmic, so, in other words, use bubble sort (whose implementation, is like a series of question 1vs1 objects and asking what's your favourite, then make a ranking) for the ten people, minimizing the questions to ask.

So we should have a way to global rank the objects, and in the meanwhile assigning to the people who will sort, major importance, and if possible, don't wait for anyoone making his ranking but on percentages and statistics basis.

Hope to have explained well my question, please if you don't feel it's for this group, let me know and transfer on another service. Thanks!

like image 708
il_maniscalco Avatar asked Aug 12 '15 22:08

il_maniscalco


2 Answers

You question is the subject of Arrow's Theorem. In short, what you are trying to do is impossible in general.

If you still want to try, I suggest using directed edges in a directed graph to represent preferences; something like majority prefers A to B, include edge A->B, and no edge in case of ties. If the result is a Directed Acyclic Graph, congratulations, you can order the items with a toposort. Otherwise use Tarjan's Algorithm to identify strongly connected components, which are the trouble spots.

In general, the best way out of this conundrum in my opinion is to obtain scores rather than ranking pairs of items. Then you just average the scores.

like image 168
Edward Doolittle Avatar answered Nov 09 '22 13:11

Edward Doolittle


After the unpromising results of my previous answer, I decided to get started on a practical aspect of the question: how to optimally ask the questions to establish a person's preference.

skipping unnecessary questions

If there are 10 items to order, there are 45 pairs of items which have to be compared. These 45 decisions make up a triangular matrix:

   0  1  2  3  4  5  6  7  8
1  >
2  >  <
3  <  >  =
4  =  >  <  =
5  >  <  <  <  >
6  <  >  >  <  >  <
7  <  >  <  =  =  <  >
8  <  <  =  >  =  <  <  <
9  =  >  >  <  <  >  >  =  >

In the worst case scenario, you'd have to ask a person 45 questions before you can fill out the whole matrix and know his ranking of the 10 items. However, if a person prefers item 1 to item 2, and item 2 to item 3, you can deduce that he prefers item 1 to item 3, and skip that question. In fact, in the best case scenario, just 9 questions will be enough to fill out the whole matrix.

Answering binary questions to deduce an item's place in an ordered list is very similar to filling a binary search tree; however, in a 10-item b-tree, the best-case scenario is 16 questions instead of our theoretical minimum of 9; so I decided to try and find another solution.

Below is an algorithm based on the triangular matrix. It asks the questions in random order, but after every answer it checks which other answers can be deduced, and avoids asking unnecessary questions.

In practice, the number of questions needed to fill out the 45-question matrix is on average 25.33, with 90.5% of instances in the 20-30 range, a minimum value of 12 and a maximum of 40 (tested on 100,000 samples, random question order, no "=" answers).
distribution: number of questions (random order)

When the questions are asked systematically (filling the matrix from top to bottom, left to right), the distribution is quite different, with a lower average of 24.44, a strange cutoff below 19, a few samples going up to the maximum of 45, and an obvious difference between odd and even numbers.
distribution: number of questions (systematic)

I wasn't expecting this difference, but it has made me realise that there are opportunities for optimisation here. I'm thinking of a strategy linked to the b-tree idea, but without a fixed root. That will be my next step. (UPDATE: see below)

function PrefTable(n) {
    this.table = [];
    for (var i = 0; i < n; i++) {
        this.table[i] = [];
        for (var j = 0; j < i; j++) {
            this.table[i][j] = null;
        }
    }

    this.addAnswer = function(x, y, pref, deduced) {
        if (x < y) {
            var temp = x; x = y; y = temp; pref *= -1;
        }
        if (this.table[x][y] == null) {
            this.table[x][y] = pref;
            if (! deduced) this.deduceAnswers();
            return true;
        }
        else if (this.table[x][y] != pref) {
            console.log("INCONSISTENT INPUT: " + x + ["<", "=", ">"][pref + 1] + y);
        }
        return false;
    }

    this.deduceAnswers = function() {
        do {
            var changed = false;
            for (var i = 0; i < this.table.length; i++) {
                for (var j = 0; j < i; j++) {
                    var p = this.table[i][j];
                    if (p != null) {
                        for (var k = 0; k < j; k++) {
                            var q = this.table[j][k];
                            if (q != null && p * q != -1) {
                                changed |= this.addAnswer(i, k, p == 0 ? q : p, true);
                            }
                        }
                        for (var k = i + 1; k < this.table.length; k++) {
                            var q = this.table[k][j];
                            if (q != null && p * q != 1) {
                                changed |= this.addAnswer(i, k, p == 0 ? -q : p, true);
                            }
                        }
                        for (var k = j + 1; k < i; k++) {
                            var q = this.table[i][k];
                            if (q != null && p * q != 1) {
                                changed |= this.addAnswer(j, k, p == 0 ? q : -p, true);
                            }
                        }
                    }
                }
            }
        }
        while (changed);
    }

    this.getQuestion = function() {
        var q = [];
        for (var i = 0; i < this.table.length; i++) {
            for (var j = 0; j < i; j++) {
                if (this.table[i][j] == null) q.push({a:i, b:j});
            }
        }
        if (q.length) return q[Math.floor(Math.random() * q.length)]
        else return null;
    }

    this.getOrder = function() {
        var index = [];
        for (i = 0; i < this.table.length; i++) index[i] = i;
        index.sort(this.compare.bind(this));
        return(index);
    }

    this.compare = function(a, b) {
        if (a > b) return this.table[a][b]
        else return 1 - this.table[b][a];
    }
}

// CREATE RANDOM ORDER THAT WILL SERVE AS THE PERSON'S PREFERENCE
var fruit = ["orange", "apple", "pear", "banana", "kiwifruit", "grapefruit", "peach", "cherry", "starfruit", "strawberry"];
var pref = fruit.slice();
for (i in pref) pref.push(pref.splice(Math.floor(Math.random() * (pref.length - i)),1)[0]);
pref.join(" ");

// THIS FUNCTION ACTS AS THE PERSON ANSWERING THE QUESTIONS
function preference(a, b) {
    if (pref.indexOf(a) - pref.indexOf(b) < 0) return -1
    else if (pref.indexOf(a) - pref.indexOf(b) > 0) return 1
    else return 0;
}

// CREATE TABLE AND ASK QUESTIONS UNTIL TABLE IS COMPLETE
var t = new PrefTable(10), c = 0, q;
while (q = t.getQuestion()) {
    console.log(++c + ". " + fruit[q.a] + " or " + fruit[q.b] + "?");
    var answer = preference(fruit[q.a], fruit[q.b]);
    console.log("\t" + [fruit[q.a], "whatever", fruit[q.b]][answer + 1]);
    t.addAnswer(q.a, q.b, answer);
}

// PERFORM SORT BASED ON TABLE
var index = t.getOrder();

// DISPLAY RESULT
console.log("LIST IN ORDER:");
for (var i in index) console.log(i + ". " + fruit[index[i]]);

update 1: asking the questions in the right order

If you ask the questions in order, filling up the triangular matrix from top to bottom, what you're actually doing is this: keeping a preliminary order of the items you've already asked about, introducing new items one at a time, comparing it with previous items until you know where to insert it in the preliminary order, and then moving on to the next item.

This algorithm has one obvious opportunity for optimisation: if you want to insert a new item into an ordered list, instead of comparing it to each item in turn, you compare it with the item in de middle: that tells you which half to new item goes into; then you compare it with the item in the middle of that half, and so on... This limits the maximum number of steps to log2(n)+1.

Below is a version of the code that uses this method. In practice, it offers very consistent results, and the number of questions needed is on average 22.21, less than half of the maximum 45. And all the results are in the 19 to 25 range (tested on 100,000 samples, no "=" answers).

distribution: number of questions (optimised order)

The advantage of this optimisation becomes more pronounced as the number of items increases; for 20 items, out of a possible 190 questions, the random method gives an average of 77 (40.5%), while the optimised method gives an average of 62 (32.6%). At 50 items, that is 300/1225 (24.5%) versus 217/1225 (17.7%).

function PrefList(n) {
    this.size = n;
    this.items = [{item: 0, equals: []}];
    this.current = {item: 1, try: 0, min: 0, max: 1};

    this.addAnswer = function(x, y, pref) {
        if (pref == 0) {
            this.items[this.current.try].equals.push(this.current.item);
            this.current = {item: ++this.current.item, try: 0, min: 0, max: this.items.length};
        } else {
            if (pref == -1) this.current.max = this.current.try
            else this.current.min = this.current.try + 1;
            if (this.current.min == this.current.max) {
                this.items.splice(this.current.min, 0, {item: this.current.item, equals: []});
                this.current = {item: ++this.current.item, try: 0, min: 0, max: this.items.length};
            }
        }
    }

    this.getQuestion = function() {
        if (this.current.item >= this.size) return null;
        this.current.try = Math.floor((this.current.min + this.current.max) / 2);
        return({a: this.current.item, b: this.items[this.current.try].item});
    }

    this.getOrder = function() {
        var index = [];
        for (var i in this.items) {
            index.push(this.items[i].item);
            for (var j in this.items[i].equals) {
                index.push(this.items[i].equals[j]);
            }
        }
        return(index);
    }
}

// PREPARE TEST DATA
var fruit = ["orange", "apple", "pear", "banana", "kiwifruit", "grapefruit", "peach", "cherry", "starfruit", "strawberry"];
var pref = fruit.slice();
for (i in pref) pref.push(pref.splice(Math.floor(Math.random() * (pref.length - i)),1)[0]);
pref.join(" ");

// THIS FUNCTION ACTS AS THE PERSON ANSWERING THE QUESTIONS
function preference(a, b) {
    if (pref.indexOf(a) - pref.indexOf(b) < 0) return -1
    else if (pref.indexOf(a) - pref.indexOf(b) > 0) return 1
    else return 0;
}

// CREATE TABLE AND ASK QUESTIONS UNTIL TABLE IS COMPLETE
var t = new PrefList(10), c = 0, q;
while (q = t.getQuestion()) {
    console.log(++c + ". " + fruit[q.a] + " or " + fruit[q.b] + "?");
    var answer = preference(fruit[q.a], fruit[q.b]);
    console.log("\t" + [fruit[q.a], "whatever", fruit[q.b]][answer + 1]);
    t.addAnswer(q.a, q.b, answer);
}

// PERFORM SORT BASED ON TABLE
var index = t.getOrder();

// DISPLAY RESULT

console.log("LIST IN ORDER:");
for (var i in index) console.log(i + ". " + fruit[index[i]]);

I think this is as far as you can optimise the binary question process for a single person. The next step is to figure out how to ask several people's preferences and combine them without introducing conflicting data into the matrix.

update 2: sorting based on the preferences of more than one person

While experimenting (in my previous answer) with algorithms where different people would answer each question, it was clear that the conflicting preferences would create a preference table with inconsistent data, which wasn't useful as a basis for comparison in a sorting algorithm.

The two algorithms earlier in this answer offer possibilities to deal with this problem. One option would be to fill out the preference table with votes in percentages instead of "before", "after" and "equal" as the only options. Afterwards, you could search for inconsistencies, and fix them by changing the decision with the closest vote, e.g. if apples vs. oranges was 80/20%, oranges vs. pears was 70/30%, and pears vs. apples was 60/40%, changing the preference from "pears before apples" to "apples before pears" would be the best way to resolve the inconsistency.

Another option would be to skip unnecessary questions, thereby removing the chance of inconsistencies in the preference table. This would be the easiest method, but the order in which the questions are asked would then have a greater impact on the end result.

The second algorithm inserts each item into a preliminary order by first checking whether it goes in the first or last half, then whether it goes in the first or last half of that half, and so on... steadily zooming in on the correct position in ever decreasing steps. This means the sequence of decisions used to determine the position of each item are of decreasing importance. This could be the basis of a system where more people are asked to vote for important decisions, and less people for less important decisions, thus reducing the number of questions that each person has to answer.

If the number of people is much greater than the number of items, you could use something like this: with every new item, the first question is put to half of the people, and every further question is then put to half of the remaining people. That way, everyone would have to answer at most one question per item, and for the whole list everyone would answer at most the number of questions equal to the number of items.

Again, with large groups of people, there are possibilities to use statistics. This could decide at which point a certain answer has developed a statistically significant lead, and the question can be considered as answered, without asking any more people. It could also be used to decide how close a vote has to be to be considered an "equal" answer.

update 3: ask subgroups based on importance of questions

This code version reduces the number of questions per person by asking important questions to a large subgroup of the population and less important questions to a smaller subgroup, as discussed in update 2.
e.g. When finding the position of the eighth item in a list already containing 7 items, a maximum number of 3 questions is needed to find the correct position; the population will therefor be split into 3 groups, whose relative sizes are 4:2:1.
The example orders 10 items based on the preferences of 20 people; the maximum number of questions any person is asked is 9.

function GroupPref(popSize, listSize) {	// CONSTRUCTOR
    if (popSize < steps(listSize)) return {};
    this.population = popSize;
    this.people = [];
    this.groups = [this.population];
    this.size = listSize;
    this.items = [{item: 0, equals: []}];
    this.current = {item: 1, question: 0, try: 0, min: 0, max: 1};

    this.getQuestion = function() {
        if (this.current.item >= this.size) return null;
        if (this.current.question == 0) this.populate();
        var group = this.people.splice(0, this.groups[this.current.question++]);
        this.current.try = Math.floor((this.current.min + this.current.max) / 2);
        return({people: group, a: this.current.item, b: this.items[this.current.try].item});
    }

    this.processAnswer = function(pref) {
        if (pref == 0) {
            this.items[this.current.try].equals.push(this.current.item);
        } else {
            if (pref < 0) this.current.max = this.current.try
            else this.current.min = this.current.try + 1;

            if (this.current.min == this.current.max) {
                this.items.splice(this.current.min, 0, {item: this.current.item, equals: []});
            } else return;
        }
        this.current = {item: ++this.current.item, question: 0, try: 0, min: 0, max: this.items.length};
        this.distribute();
    }

    function steps(n) {
        return Math.ceil(Math.log(n) / Math.log(2));
    }

    this.populate = function() {
        for (var i = 0; i < this.population; i++) this.people.splice(Math.floor(Math.random() * (i + 1)), 0, i);
    }

    this.distribute = function() {
        var total = this.population, groups = steps(this.current.item + 1);
        this.groups.length = 0;
        for (var i = 0; i < groups; i++) {
            var size = Math.round(Math.pow(2, i) * total / (Math.pow(2, groups) - 1));
            if (size == 0) ++size, --total;
            this.groups.unshift(size);
        }
    }

    this.getOrder = function() {
        var index = [];
        for (var i in this.items) {
            var equal = [this.items[i].item];
            for (var j in this.items[i].equals) {
                equal.push(this.items[i].equals[j]);
            }
            index.push(equal);
        }
        return(index);
    }
}

// PREPARE TEST DATA
var fruit = ["orange", "apple", "pear", "banana", "kiwifruit", "grapefruit", "peach", "cherry", "starfruit", "strawberry"];
var pref = [];
for (i = 0; i < 20; i++) {
    var temp = fruit.slice();
    for (j in temp) temp.push(temp.splice(Math.floor(Math.random() * (temp.length - j)), 1)[0]);
    pref[i] = temp.join(" ");
}

// THIS FUNCTION ACTS AS THE PERSON ANSWERING THE QUESTIONS
function preference(person, a, b) {
    if (pref[person].indexOf(a) - pref[person].indexOf(b) < 0) return -1
    else if (pref[person].indexOf(a) - pref[person].indexOf(b) > 0) return 1
    else return 0;
}

// CREATE LIST AND ANSWER QUESTIONS UNTIL LIST IS COMPLETE
var t = new GroupPref(20, 10), c = 0, q;
while (q = t.getQuestion()) {
    var answer = 0;
    console.log(++c + ". ask " + q.people.length + " people (" + q.people + ")\n\tq: " + fruit[q.a] + " or " + fruit[q.b] + "?");
    for (i in q.people) answer += preference(q.people[i], fruit[q.a], fruit[q.b]);
    console.log("\ta: " + [fruit[q.a], "EQUAL", fruit[q.b]][answer != 0 ? answer / Math.abs(answer) + 1 : 1]);
    t.processAnswer(answer);
}

// GET ORDERED LIST AND DISPLAY RESULT
var index = t.getOrder();
console.log("LIST IN ORDER:");
for (var i = 0, pos = 1; i < index.length; i++) {
    var pre = pos + ". ";
    for (var j = 0; j < index[i].length; j++) {
        console.log(pre + fruit[index[i][j]]);
        pre = "   ";
    }
    pos += index[i].length;
}
like image 38