I have a weighted, directed graph which is dense with around 20,000 nodes.
I learned yesterday about the alias method for simulating rolling a weighted die, which is the same as making one choice (each node is one weighted die, and the sides correspond to other nodes). One roll is highly efficient, but updating the weights is not; the alias method may not be appropriate because I will be updating more dice than I will be rolling!
What data structure should I use, which allows for frequent updates, and what corresponding algorithm is best for making the choices?
Some ideas/notes:
The truth is that in practice I don't need to make choices very often (no more than once a minute), so I don't need the most efficient solution. But this is a fun side project and I'm interested in finding a theoretically optimal solution.
I think you can do it with log(k) complexity where k is number of faces in the dice.
for one particular node let p1, p2, ..., pk be the relative probabilities. Let p1+p2,...,+pk = p.
Construct a tree structure with these relative probabilities as leaves. the parent of each non-leaf node is sum of relative probabilities of of their children. To "roll a dice" draw a random value between 0 and p, and follow it through the tree. When you want to update relative probability of a dice face, just change the corresponding leaf node value and propagate it up through the tree.
In this way you can choose a random value with one random number with log(k) steps needed to find the leaf corresponding to the random number, and when you update one leaf it takes log(k) time to update the tree.
This is a very simple description of solution and let me know if you need complete description. I am sure it works, but not sure if this is efficient enough for you needs.
to summarize, this algorithm needs: 1. Only one random number between 0 and p 2. O(log(k)) complexity to "roll the dice" (i.e. find the next node) where k is the number of faces in the dice 3. O(log(k)) to "update the dice" of a given node. If there are m edges from the original node, then complexity is O(log(k1))+O(log(k2))...O((km)) where k1, k2, ... km are connectivity of adjacent nodes.
====Tree Example====
If there are 4 faces to the dice and the relative probabilities are 1:50, 2:80, 3:20, 4:70 construct the tree as follows:
220
/ \
130 90
/ \ / \
50 80 20 70
| | | |
1 2 3 4
Generate a random number v between 0 and 220. If it is v=100: take the left route (since 100<130) and then take the right route (since 100>80) and update v = v-80 = 20. Since we are at leaf declare o/p i.e. 2
If v=210: left and v=v-130=80, left v=v-70=10, return leaf=4
If 4 changes to 60, change 70 to 60, 90 to 80 and 220 to 210.
==== Lazy update variation ====
Whenever weights are changed don't update the tree immediately. Instead, just mark it as a "dirty weight" wait until you need to make prediction from this particular node.
When you need to make a prediction from a particular node and if some of the weights are dirty, either a. update the tree with only dirty nodes or b. update the whole tree. If the number of dirty weights is t and number of total weights k, if t*log(k) < k, then only update the tree corresponding to dirty weights ( t*O(k)) otherwise update the whole tree (O(k)).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With