Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good Idea/Bad Idea? Using MySQL RAND() outside of a small set of subquery results?

Tags:

php

random

mysql

So in MySQL, I've read that for large tables with lots of rows, using ORDER BY RAND() is a bad idea (even with ~500 row tables, supposedly). Slow and inefficient. Lots of row scanning.

How does this (below) seem for an alternative?

SELECT * FROM (...subquery that generally returns a set of fewer than 20 rows...) ORDER BY RAND() LIMIT 8

Instead of using RAND() on a large set of data, I'd select a small subset, and only then would I apply RAND() on those returned rows. In 99.9% of all cases, the subquery seen above should select fewer than 20 rows (and in fact, it's generally fewer than 8).

Curious to hear what people think.

(Just for reference, I'm doing my MySQL stuff with PHP.)

Thanks!

like image 721
hithere Avatar asked Feb 25 '23 08:02

hithere


1 Answers

Actually...I ended up running a test and I might have answered my own question. I thought I'd post this information here in case it was useful for anyone else. (If I've done anything wrong here, please let me know!)

This is kind of surprising...

Contrary to everything that I've read, I created a table called TestData with 1 million rows and ran the following query:

SELECT * FROM TestData WHERE number = 41 ORDER BY RAND() LIMIT 8

...and it returned the rows in an average of 0.0070 seconds. I don't really see why RAND() has such a bad reputation. It seems pretty usable to me, at least in this particular situation.

I have three columns in my table:

id [BIGINT(20)] | textfield [tinytext] | number [BIGINT(20)]

Primary Key on id, index on number.

I guess MySQL is smart enough to know that it should only be applying RAND() to the 20 rows that are returned by "WHERE number = 41" ? (I specifically added only 20 rows that had the value 41 for 'number'.)

The alternate subquery method returns results with an average time of around .0080 seconds, which is slower than the non-subquery method.

Subquery method: SELECT * FROM (SELECT * FROM TestData WHERE number = 41) as t ORDER BY RAND() LIMIT 8

like image 76
hithere Avatar answered Apr 27 '23 20:04

hithere