Efficient node traffic allocation

Question

Users can be assigned to one experiment on my site. I have an API that developers use to trigger logic for each experiment. They call ExperimentEngine.run() to trigger the code logic for the experiment.

I would like to allocate traffic to each experiment at exposure, at the point where a user might be exposed to the logic for that experiment. I would like to allocate traffic so that experiments that users usually see last don't get starved.

For example, if user A is exposed to experiment A at login and then goes to page B and get's exposed to experiment B, the user A should be assigned to either experiment A or B at point exposure. That means that they will only see one of the experiments and not both (either A or B) or neither. I would like to figure out the right algorithm so that experiment B (which is downstream and shown to the user after they've seen experiment A) does not get starved of traffic. I don't want all traffic going to experiment A.

So the flow is as follow

User visits page A where experiment A is implemented
We decide whether to assign users to experiment A. If user is assigned to A, user will be able to see experiment A.
User visits page B where experiment B is implemented, we decide whether to assign users to experiment B
Users can only see experiments that they are assigned to.
I want to come up with an algorithm that allows me to assign traffic to experiments regardless of where they are implemented so that the traffic distribution is efficient and experiments implemented downstream don't get starved (even though user sees experiment B last, they still get a good chance of being assigned to B)

Can someone please point me in the right direction to an algorithm that I can use to efficiently allocate traffic to experiments so that experiments reach sample size and stats sig in good time in a system where experiments are allocated traffic at point of exposure and where experiments are "exposed" to the user at different points of the flow (early or later on) and in a way that makes it so that experiments exposed later on are not starved of traffic?

A possible algorithm:

For each experiment, we make a decision of whether to assign based on the experiment's location using coin flip.
If we get heads, a list of experiments that match the user's criteria and that are implemented for that location are selected.
An experiment is chosen from that list based on priority system. At every location, a % of users are assigned to one of the experiments implemented at that location.
When we decide to assign or not to assign to any experiments at that location, that decision is not made again for the user.

What I am struggling with is what that priority system algorithm should be? and also is this the most efficient way to assign users to experiments that are implemented at different points of the flow? How do we decide whether to assign users to an experiment at a specific location? Right now we use coin flip, but that means 50% of users will be assigned to an experiment at each location, which does not work.

mcdowella · Accepted Answer

If you can collect lists of page visits per user then you can work out, for each probability of running an experiment when a user visits its page, the probability with which each experiment is run.

Given this you need to work out what collection of probability settings will achieve the desired result. If you have a user track that visits pages A,B,C each running different experiments with probabilities p, q, r, then the probability of running A is p, the probability of running B is q(1-p), and the probability of running C is r(1-q)(1-p), and the overall probabilities are the sum of all of the user tracks - so you can work out not only the probabilities as a function of p,q,r but also the derivatives of these probabilities with respect to p,q,r.

This means that you should be able to find some numerical analysis optimization routine that will find values of p,q,r... to minimize the sum of the squared differences between the probabilities of running particular experiments from those values and whatever target values for those probabilities you have.

(Actually the maths might be nicer if you optimize some linear function of the probability the user running the various experiments, probably varying the linear function until you get a result that appeals to you).

Efficient node traffic allocation

Tags:

algorithm

computer-science

statistics

graph-algorithm

Chris Hansen

1 Answers

mcdowella

Recent Activity

Donate For Us

Efficient node traffic allocation

Tags:

algorithm

computer-science

statistics

graph-algorithm

Chris Hansen

1 Answers

mcdowella

Related questions

Recent Activity

Donate For Us