Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to randomly select multiple rows satisfying certain conditions from a MySQL table?

Tags:

random

mysql

rows

I'm looking for an efficient way of randomly selecting 100 rows satisfying certain conditions from a MySQL table with potentially millions of rows.

Almost everything I've found suggests avoiding the use of ORDER BY RAND(), because of poor performance and scalability.

However, this article suggests ORDER BY RAND() may still be used as a "nice and fast way" to fetch randow data.

Based on this article, below is some example code showing what I'm trying to accomplish. My questions are:

  1. Is this an efficient way of randomly selecting 100 (or up to several hundred) rows from a table with potentially millions of rows?

  2. When will performance become an issue?

    SELECT  user.* 
    FROM    (
            SELECT  id 
            FROM    user 
            WHERE   is_active = 1
            AND     deleted = 0
            AND     expiretime > '.time().'
            AND     id NOT IN (10, 13, 15)
            AND     id NOT IN (20, 30, 50)
            AND     id NOT IN (103, 140, 250)
        ORDER BY    RAND() 
            LIMIT   100
            ) 
            AS      random_users
    STRAIGHT JOIN   user
    ON      user.id = random_users.id
like image 669
user1298692 Avatar asked Nov 14 '22 08:11

user1298692


1 Answers

Is strongly urge you to read this article. The last segment will be covering the selection of multiple random row. And you should be able to notice the SELECT statement in the PROCEDURE that will be described there. That would be the spot where you add your specific WHERE conditions.

The problem with ORDER BY RAND() is that this operation has complexity of n*log2(n), while the method described in the article that I linked, has almost constant complexity.

Lets assume, that selecting random row from table, which contains 10 entries, using ORDER BY RAND() takes 1 time unit:

  entries  |  time units
-------------------------
       10  |         1     /* if this takes 0.001s */
      100  |        20
    1'000  |       300
   10'000  |     4'000
  100'000  |    50'000
1'000'000  |   600'000     /* then this will need 10 minutes */

And you wrote that you are dealing with table on scale of millions.

like image 99
tereško Avatar answered Dec 04 '22 07:12

tereško