I've been doing some research and testing on how to do fast random selection in MySQL. In the process I've faced some unexpected results and now I am not fully sure I know how ORDER BY RAND() really works.
I always thought that when you do ORDER BY RAND() on the table, MySQL adds a new column to the table which is filled with random values, then it sorts data by that column and then e.g. you take the above value which got there randomly. I've done lots of googling and testing and finally found that the query Jay offers in his blog is indeed the fastest solution:
SELECT * FROM Table T JOIN (SELECT CEIL(MAX(ID)*RAND()) AS ID FROM Table) AS x ON T.ID >= x.ID LIMIT 1;
While common ORDER BY RAND() takes 30-40 seconds on my test table, his query does the work in 0.1 seconds. He explains how this functions in the blog so I'll just skip this and finally move to the odd thing.
My table is a common table with a PRIMARY KEY id
and other non-indexed stuff like username
, age
, etc. Here's the thing I am struggling to explain
SELECT * FROM table ORDER BY RAND() LIMIT 1; /*30-40 seconds*/ SELECT id FROM table ORDER BY RAND() LIMIT 1; /*0.25 seconds*/ SELECT id, username FROM table ORDER BY RAND() LIMIT 1; /*90 seconds*/
I was sort of expecting to see approximately the same time for all three queries since I am always sorting on a single column. But for some reason this didn't happen. Please let me know if you any ideas about this. I have a project where I need to do fast ORDER BY RAND() and personally I would prefer to use
SELECT id FROM table ORDER BY RAND() LIMIT 1; SELECT * FROM table WHERE id=ID_FROM_PREVIOUS_QUERY LIMIT 1;
which, yes, is slower than Jay's method, however it is smaller and easier to understand. My queries are rather big ones with several JOINs and with WHERE clause and while Jay's method still works, the query grows really big and complex because I need to use all the JOINs and WHERE in the JOINed (called x in his query) sub request.
Thanks for your time!
If you ORDER BY RAND() a random number is calculated for every single row in the table. This is because it must calculate the random value for every row in order to know which row generated the largest value.
We learned that SQL Server doesn't guarantee any order of the results stored in the table, nor in the results set returned from your queries, but we can sort the output by using the order by clause.
Now, the SQL ORDER BY clause does not only work for columns containing string values. It can handle numbers as well! Let's sort the list by employee number in descending order.
While there's no such thing as a "fast order by rand()", there is a workaround for your specific task.
For getting any single random row, you can do like this german blogger does: http://web.archive.org/web/20200211210404/http://www.roberthartung.de/mysql-order-by-rand-a-case-study-of-alternatives/ (I couldn't see a hotlink url. If anyone sees one, feel free to edit the link.)
The text is in german, but the SQL code is a bit down the page and in big white boxes, so it's not hard to see.
Basically what he does is make a procedure that does the job of getting a valid row. That generates a random number between 0 and max_id, try fetching a row, and if it doesn't exist, keep going until you hit one that does. He allows for fetching x number of random rows by storing them in a temp table, so you can probably rewrite the procedure to be a bit faster fetching only one row.
The downside of this is that if you delete A LOT of rows, and there are huge gaps, the chances are big that it will miss tons of times, making it ineffective.
Update: Different execution times
SELECT * FROM table ORDER BY RAND() LIMIT 1; /30-40 seconds/
SELECT id FROM table ORDER BY RAND() LIMIT 1; /0.25 seconds/
SELECT id, username FROM table ORDER BY RAND() LIMIT 1; /90 seconds/
I was sort of expecting to see approximately the same time for all three queries since I am always sorting on a single column. But for some reason this didn't happen. Please let me know if you any ideas about this.
It may have to do with indexing. id
is indexed and quick to access, whereas adding username
to the result, means it needs to read that from each row and put it in the memory table. With the *
it also has to read everything into memory, but it doesn't need to jump around the data file, meaning there's no time lost seeking.
This makes a difference only if there are variable length columns (varchar/text), which means it has to check the length, then skip that length, as opposed to just skipping a set length (or 0) between each row.
It may have to do with indexing. id is indexed and quick to access, whereas adding username to the result, means it needs to read that from each row and put it in the memory table. With the * it also has to read everything into memory, but it doesn't need to jump around the data file, meaning there's no time lost seeking. This makes a difference only if there are variable length columns, which means it has to check the length, then skip that length, as opposed to just skipping a set length (or 0) between each row
Practice is better that all theories! Why not just to check plans? :)
mysql> explain select name from avatar order by RAND() limit 1; +----+-------------+--------+-------+---------------+-----------------+---------+------+-------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+-------+---------------+-----------------+---------+------+-------+----------------------------------------------+ | 1 | SIMPLE | avatar | index | NULL | IDX_AVATAR_NAME | 302 | NULL | 30062 | Using index; Using temporary; Using filesort | +----+-------------+--------+-------+---------------+-----------------+---------+------+-------+----------------------------------------------+ 1 row in set (0.00 sec) mysql> explain select * from avatar order by RAND() limit 1; +----+-------------+--------+------+---------------+------+---------+------+-------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+-------+---------------------------------+ | 1 | SIMPLE | avatar | ALL | NULL | NULL | NULL | NULL | 30062 | Using temporary; Using filesort | +----+-------------+--------+------+---------------+------+---------+------+-------+---------------------------------+ 1 row in set (0.00 sec) mysql> explain select name, experience from avatar order by RAND() limit 1; +----+-------------+--------+------+--------------+------+---------+------+-------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+-------+---------------------------------+ | 1 | SIMPLE | avatar | ALL | NULL | NULL | NULL | NULL | 30064 | Using temporary; Using filesort | +----+-------------+--------+------+---------------+------+---------+------+-------+---------------------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With