Users can be assigned to one experiment on my site. I have an API that developers use to trigger logic for each experiment. They call ExperimentEngine.run() to trigger the code logic for the experiment.
I would like to allocate traffic to each experiment at exposure, at the point where a user might be exposed to the logic for that experiment. I would like to allocate traffic so that experiments that users usually see last don't get starved.
For example, if user A is exposed to experiment A at login and then goes to page B and get's exposed to experiment B, the user A should be assigned to either experiment A or B at point exposure. That means that they will only see one of the experiments and not both (either A or B) or neither. I would like to figure out the right algorithm so that experiment B (which is downstream and shown to the user after they've seen experiment A) does not get starved of traffic. I don't want all traffic going to experiment A.
So the flow is as follow
Can someone please point me in the right direction to an algorithm that I can use to efficiently allocate traffic to experiments so that experiments reach sample size and stats sig in good time in a system where experiments are allocated traffic at point of exposure and where experiments are "exposed" to the user at different points of the flow (early or later on) and in a way that makes it so that experiments exposed later on are not starved of traffic?
A possible algorithm:
What I am struggling with is what that priority system algorithm should be? and also is this the most efficient way to assign users to experiments that are implemented at different points of the flow? How do we decide whether to assign users to an experiment at a specific location? Right now we use coin flip, but that means 50% of users will be assigned to an experiment at each location, which does not work.
If you can collect lists of page visits per user then you can work out, for each probability of running an experiment when a user visits its page, the probability with which each experiment is run.
Given this you need to work out what collection of probability settings will achieve the desired result. If you have a user track that visits pages A,B,C each running different experiments with probabilities p, q, r, then the probability of running A is p, the probability of running B is q(1-p), and the probability of running C is r(1-q)(1-p), and the overall probabilities are the sum of all of the user tracks - so you can work out not only the probabilities as a function of p,q,r but also the derivatives of these probabilities with respect to p,q,r.
This means that you should be able to find some numerical analysis optimization routine that will find values of p,q,r... to minimize the sum of the squared differences between the probabilities of running particular experiments from those values and whatever target values for those probabilities you have.
(Actually the maths might be nicer if you optimize some linear function of the probability the user running the various experiments, probably varying the linear function until you get a result that appeals to you).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With