What does Simpson's paradox imply in AB testing?

2 Answers

If A is clearly, significantly better in individual A/B tests, while B scores better in aggregate, then the main implication is that you can't aggregate those data sets that way. A is better.

If the testing got the same results every day, you wouldn't get this clear result, even with varying sample sizes per day. So I think it additionally implies that something has changed. It could be anything, though. Maybe what you tested each day changed (perhaps in some very subtle way, like server speed). Or maybe the people you're testing it on changed (perhaps demographically, perhaps just in terms of their mood). That doesn't mean your testing is bad or invalid. It just means you're measuring something that's moving, and that makes things tricky.

And I might be miscalculating or misunderstanding the situation, but I think it is also necessarily true that you haven't been testing A and B the same number of times. That is, if on Monday you tested A 50 times and B 50 times, and on Tuesday you tested A 600 times and B 600 times, and so on, and A outscored B each day, then I don't see how you could get an aggregate result where B beats A. If this is true of your test setup, it certainly seems like something you could fix to make your data easier to reason about.

answered Oct 11 '22 15:10

Jason Orendorff

It's a little difficult to say without seeing the exact data & the dimensions you are testing, but generally speaking you want to make decisions based on the uncombined data. This article from Microsoft gives a pretty clear example of Simpson's paradox in software testing.

Can you provide a clean example of your combined and uncombined data and a brief summary of the test?

answered Oct 11 '22 15:10

Chris Clark

Related questions
                            
                                Space ship simulator guidance computer targeting with concentric indicator squares
                            
                                How to round in javascript like PHP do
                            
                                "K-transformed" permutations
                            
                                Excel-like ceiling function in python?
                            
                                (Un)folding a sheet of paper according to a pattern and giving the order of the layers
                            
                                Units conversion in Python
                            
                                Basic arithmetic in GWT CssResource
                            
                                Hello World example for SimplexSolver
                            
                                How to find ith item in zigzag ordering?
                            
                                Subtraction operation using only increment, loop, assign, zero
                            
                                3d bin packing algorithm [closed]
                            
                                Does an open-source poker-related math library exist? [closed]
                            
                                Fastest way to find the angle between two points
                            
                                How to validate brackets in equation string in PHP
                            
                                In-place interleaving of the two halves of a string
                            
                                Solve system of two equations with two unknowns
                            
                                In numpy, calculating a matrix where each cell contains the product of all the other entries in that row
                            
                                Precise subpixel line drawing algorithm (rasterization algorithm)
                            
                                Can I use arbitrary metrics to search KD-Trees?
                            
                                What's the most efficient way to detect triangle-triangle intersections?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What does Simpson's paradox imply in AB testing?

Tags:

math

testing

statistics

ab-testing

Toto

People also ask

2 Answers

Jason Orendorff

Chris Clark

Recent Activity

Donate For Us