So in MySQL, I've read that for large tables with lots of rows, using ORDER BY RAND() is a bad idea (even with ~500 row tables, supposedly). Slow and inefficient. Lots of row scanning.
How does this (below) seem for an alternative?
SELECT * FROM (...subquery that generally returns a set of fewer than 20 rows...) ORDER BY RAND() LIMIT 8
Instead of using RAND() on a large set of data, I'd select a small subset, and only then would I apply RAND() on those returned rows. In 99.9% of all cases, the subquery seen above should select fewer than 20 rows (and in fact, it's generally fewer than 8).
Curious to hear what people think.
(Just for reference, I'm doing my MySQL stuff with PHP.)
Thanks!
Actually...I ended up running a test and I might have answered my own question. I thought I'd post this information here in case it was useful for anyone else. (If I've done anything wrong here, please let me know!)
This is kind of surprising...
Contrary to everything that I've read, I created a table called TestData with 1 million rows and ran the following query:
SELECT * FROM TestData WHERE number = 41 ORDER BY RAND() LIMIT 8
...and it returned the rows in an average of 0.0070 seconds. I don't really see why RAND() has such a bad reputation. It seems pretty usable to me, at least in this particular situation.
I have three columns in my table:
id [BIGINT(20)] | textfield [tinytext] | number [BIGINT(20)]
Primary Key on id, index on number.
I guess MySQL is smart enough to know that it should only be applying RAND() to the 20 rows that are returned by "WHERE number = 41" ? (I specifically added only 20 rows that had the value 41 for 'number'.)
The alternate subquery method returns results with an average time of around .0080 seconds, which is slower than the non-subquery method.
Subquery method: SELECT * FROM (SELECT * FROM TestData WHERE number = 41) as t ORDER BY RAND() LIMIT 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With