I am trying to find/figure out a function that can update probabilities.
Suppose there are three players and each of them get a fruit out of a basket: ["apple", "orange", "banana"]
I store the probabilities of each player having each fruit in a matrix (like this table):
| apple | orange | banana | |
|---|---|---|---|
| Player 1 | 0.3333 | 0.3333 | 0.3333 | 
| Player 2 | 0.3333 | 0.3333 | 0.3333 | 
| Player 3 | 0.3333 | 0.3333 | 0.3333 | 
The table can be interpreted as the belief of someone (S) who doesn't know who has what. Each row and column sums to 1.0 because each player has one of the fruits and each fruit is at one of the players.
I want to update these probabilities based on some knowledge that S gains. Example information:
Player 1 did X. We know that Player 1 does X with 80% probability if he has an apple. With 50% if he has an orange. With 10% if he has a banana.
This can be written more concisely as [0.8, 0.5, 0.1] and let us call it reach_probability.
A fairly easy to comprehend example is:
probabilities = [
    [0.5, 0.5, 0.0],
    [0.0, 0.5, 0.5],
    [0.5, 0.0, 0.5],
]
# Player 1's 
reach_probability = [1.0, 0.0, 1.0]
new_probabilities = [
    [1.0, 0.0, 0.0],
    [0.0, 1.0, 0.0],
    [0.0, 0.0, 1.0],
]
The above example can be fairly easily thought through.
another example:
probabilities = [
    [0.25, 0.25, 0.50],
    [0.25, 0.50, 0.25],
    [0.50, 0.25, 0.25],
]
# Player 1's 
reach_probability = [1.0, 0.5, 0.5]
new_probabilities = [
    [0.4, 0.2, 0.4],
    [0.2, 0.5, 0.3],
    [0.4, 0.3, 0.3],
]
In my use case using a simulation is not an option. My probabilities matrix is big. Not sure if the only way to calculate this is using an iterative algorithm or if there is a better way.
I looked at bayesian stuff and not sure how to apply it in this case. Updating it row by row then spreading out the difference proportionally to the previous probabilities seems promising but I haven't managed to make it work correctly. Maybe it isn't even possible like that.
Initial condition: p(apple) = p(orange) = p(banana) = 1/3.
Player 1 did X. We know that Player 1 does X with 80% probability if he has an apple. With 50% if he has an orange. With 10% if he has a banana.
p(X | apple) = 0.8 p(x | orange) = 0.5 p(x | banana) = 0.1
Since apple, orange, and banana are all equally likely at 1/3, we have p(x) = 1/3 * 1.4) ~ 0.466666666.
Recall Bayes theorem: p(a | b) = p(b|a) * p(a) / p(b)
So p(apple | x) = p(x | apple) * p(apple) / p(x) = 0.8 * (1/3) / 0.46666666 ~ 57.14%
similarly p(orange | x) = 0.5 * (1/3) / 0.46666666 ~ 35.71%
and p(banana | x) = 0.1 * (1/3) / 0.46666666 ~ 7.14%
Taking your example:
probabilities = [
    [0.25, 0.25, 0.50],
    [0.25, 0.50, 0.25],
    [0.50, 0.25, 0.25],
]
# Player 1's 
reach_probability = [1.0, 0.5, 0.5]
new_probabilities = [
    [0.4, 0.2, 0.4],
    [0.2, 0.5, 0.3],
    [0.4, 0.3, 0.3],
]
p(x) = 0.25 * 1.0 + 0.25 * 0.5 + 0.5 * 0.5 = 0.625
p(a|x) = p(x|a) * p(a) / p(x) = 1.0 * 0.25 / 0.625 = 0.4
p(b|x) = p(x|b) * p(b) / p(x) = 0.5 * 0.25 / 0.625 = 0.2
p(c|x) = p(x|c) * p(c) / p(x) = 0.5 * 0.50 / 0.625 = 0.4
As desired. The other entries of each column can just be scaled to get a column sum of 1.0.
E.g. in column 1 we multiple the other entries by (1-0.4)/(1-0.25). This takes 0.25 -> 0.2 and 0.50 -> 0.40. Similarly for the other columns.
new_probabilities = [
    [0.4, 0.200, 0.4],
    [0.2, 0.533, 0.3],
    [0.4, 0.266, 0.3],
]
If then player 2 does y with the same conditional probabilities we get:
p(y) = 0.2 * 1.0 + 0.533 * 0.5 + 0.3 * 0.5 = 0.6165
p(a|y) = p(y|a) * p(a) / p(y) = 1.0 * 0.2 / 0.6165 = 0.3244
p(b|y) = p(y|b) * p(b) / p(y) = 0.5 * 0.533 / 0.6165 = 0.4323
p(c|y) = p(y|c) * p(c) / p(y) = 0.5 * 0.266 / 0.6165 = 0.2157
Check this document: Endgame Solving in Large Imperfect-Information Games∗
(S. Ganzfried, T. Sandholm, in International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) (2015), pp. 37–45.)
Here is how I would approach this - have not worked through whether this has problems too but it seems alright in your examples.
Assume each update is of the form "X,Y has probability p'" Mark element X,Y dirty with delta p - p', where p was the old probability. Now, redistribute the delta proportionally to all unmarked elements in the row, then the column, marking each dirty with its own delta, and marking the first clean. Continue until no dirty entry remains.
0.5   0.5   0.0
0.0   0.5   0.5
0.5   0.0   0.5
Belief: 2,1 has probability zero.
0.5   0.0*  0.0    update 2,1 and mark dirty
0.0   0.5   0.5    delta is 0.5
0.5   0.0   0.5
1.0*  0.0'  0.0    distribute 0.5 to row & col
0.0   1.0*  0.5    update as dirty, both deltas -0.5
0.5   0.0   0.5
1.0'  0.0'  0.0    distribute -0.5 to rows & cols
0.0   1.0'  0.0*   update as dirty, both deltas 0.5
0.0*  0.0   0.5
1.0'  0.0'  0.0    distribute 0.5 to row & col
0.0   1.0'  0.0'   update as dirty, delta is -0.5
0.0'  0.0   1.0*
1.0'  0.0'  0.0    distribute on row/col
0.0   1.0'  0.0'   no new dirty elements, complete
0.0'  0.0   1.0'
In your first example:
1/3   1/3   1/3
1/3   1/3   1/3
1/3   1/3   1/3
Belief: 3,1 has probability 0
1/3   1/3   0*     update 3,1 to zero, mark dirty
1/3   1/3   1/3   delta is 1/3
1/3   1/3   1/3
1/2*  1/2*  0'    distribute 1/3 proportionally across row then col
1/3   1/3   1/2*  delta is -1/6
1/3   1/3   1/2*
1/2'  1/2'  0'    distribute -1/6 proportionally across row then col
1/4*  1/4*  1/2'  delta is 1/12
1/4*  1/4*  1/2'
1/2'  1/2'  0'    distribute prportionally to unmarked entries
1/4'  1/4'  1/2' no new dirty entries, terminate
1/4'  1/4'  1/2'
You can mark entries dirty by inserting them with associated deltas into a queue and a hashset. Entries in both the queue and hash set are dirty. Entries in the hashset only are clean. Process the queue until you run out of entries.
I do not show an example where distribution is uneven, but the key is to distribute proportionally. Entries with 0 can never become non-zero except by a new belief.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With