Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ORDER BY RAND() seems to be less than random

I have a fairly simple SQL (MySQL):

SELECT foo FROM bar ORDER BY rank, RAND()

I notice that when I refresh the results, the randomness is suspiciously weak.

In the sample data at the moment there are six results with equal rank (integer zero). There are lots of tests for randomness but here is a simple one to do by hand: when run twice, the first result should be the same in both runs about one sixth of the time. This is certainly not happening, the leading result is the same at least a third of the time.

I want a uniform distribution over the permutations. I'm not an expert statistician but I'm pretty sure ORDER BY RAND() should achieve this. What am I missing?

With MySQL, SELECT rand(), rand() shows two different numbers, so I don't buy the "once per query" explanation

like image 772
spraff Avatar asked Oct 08 '22 05:10

spraff


2 Answers

RAND() is only executed once per query. You can verify this by looking at the result set.

If you're trying to get a randomized order, you should be using either NEWID() or CHECKSUM(NEWID()).

WITH T AS ( -- example using RAND()
  SELECT 'Me' Name UNION SELECT 'You' UNION SELECT 'Another'
)
SELECT Name, RAND()
FROM T;

WITH T AS ( -- example using just NEWID()
  SELECT 'Me' Name UNION SELECT 'You' UNION SELECT 'Another'
)
SELECT Name, NEWID()
FROM T;

WITH T AS ( -- example getting the CHECKSUM() of NEWID()
  SELECT 'Me' Name UNION SELECT 'You' UNION SELECT 'Another'
)
SELECT Name, CHECKSUM(NEWID())
FROM T;
like image 88
Yuck Avatar answered Oct 12 '22 22:10

Yuck


The RAND() can not be refresh for each row. A possible solution might be:

SELECT foo FROM bar ORDER BY rank, CHECKSUM(NEWID())
like image 24
Arion Avatar answered Oct 12 '22 22:10

Arion