Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL equally distributed random rows with WHERE clause

I have this table,

person_id   int(10) pk
points      int(6) index
other columns not very important

I have this random function which is very fast on a table with 10M rows:

SELECT person_id
  FROM persons AS r1 JOIN
       (SELECT (RAND() *
                     (SELECT MAX(person_id)
                        FROM persons)) AS id)
        AS r2
 WHERE r1.person_id >= r2.id
 ORDER BY r1.person_id ASC
 LIMIT 1

This is all great but now I wish to show only people with points > 0. Example table:

PERSON_ID      POINTS
1              4
2              6
3              0
4              3

When I append AND points > 0 to the where clause, person_id 3 can't be selected, so a gap is created and when the random select person_id 3, person_id 4 will be selected. This gives person 4 a bigger chance to be chosen. Any one got suggestions how I can adjust the query to make it work with the where clause and give all rows same % of chance to be selected.

Info table: The table is uniform, no gaps in person_id's. About 90% will have 0 points. I want to make the query for where points = 0 and points > 0.

Before someone will say, use rand(): this is not solution for tables with more than a few 100k rows.

Bonus question: will it be possible to select x random rows in 1 query, so I do not have to call this query a few times when i want more random rows?

Important note: performance is key, with 10M+ rows the query may not take much longer than the current query, which takes 0.0005 seconds, I prefer to stay under 0.05 second.

Last note: If you think the query will never be this fast with above requirements, but another solution is possible (like fetching 100 rows and showing x random which has more than 0 points), please tell :)

Really appreciate your help and all help is welcome :)

like image 998
Kevin Vermaat Avatar asked Nov 13 '22 05:11

Kevin Vermaat


1 Answers

You could generate in-line gap-free id's for the records that you really want to work with, and generate then the random selector using the total number of records available.

Try with this (props to the chosen answer here for the row_number generator):

    SELECT r1.*
    FROM
        (SELECT  person_id,
                 @curRow := @curRow + 1 AS row_number
        FROM persons as p,
             (SELECT @curRow := 0) r0
        WHERE points>0) r1
    , (SELECT COUNT(1) * RAND() id
       FROM persons
       WHERE points>0) r2
    WHERE r1.person_id>=r2.id
    ORDER BY r1.person_id ASC
    LIMIT 1;

You can mess with it in this sqlfiddle.

like image 122
listik Avatar answered Nov 15 '22 07:11

listik