Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL vs PHP when retrieving a random item

Tags:

php

mysql

which is more efficient (when managing over 100K records):

A. Mysql

SELECT * FROM user ORDER BY RAND();

of course, after that i would already have all the fields from that record.

B. PHP

use memcached to have $cache_array hold all the data from "SELECT id_user FROM user ORDER BY id_user" for 1 hour or so... and then:

$id = array_rand($cache_array);

of course, after that i have to make a MYSQL call with:

SELECT * FROM user WHERE id_user = $id;

so... which is more efficient? A or B?

like image 603
Andres SK Avatar asked Mar 31 '10 15:03

Andres SK


2 Answers

The proper way to answer this kind of question is to do a benchmark. Do a quick and dirty implementation each way and then run benchmark tests to determine which one performs better.

Having said that, ORDER BY RAND() is known to be slow because it's impossible for MySQL to use an index. MySQL will basically run the RAND() function once for each row in the table and then sort the rows based on what came back from RAND().

Your other idea of storing all user_ids in memcached and then selecting a random element form the array might perform better if the overhead of memcached proves to be less than the cost of a full table scan. If your dataset is large or staleness is a problem, you may run into issues though. Also you're adding some complexity to your application. I would try to look for another way.

I'll give you a third option which might outperform both your suggestions: Select a count(user_id) of the rows in your user table and then have php generate a random number between 0 and the result of count(user_id) minus 1, inclusive. Then do a SELECT * FROM user LIMIT 1 OFFSET random-number-generated-by-php;.

Again, the proper way to answer these types of questions is to benchmark. Anything else is speculation.

like image 71
Asaph Avatar answered Nov 12 '22 21:11

Asaph


The first one is incredibly slow because

MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.

It's elaborated more on this blog post.

like image 24
ryeguy Avatar answered Nov 12 '22 22:11

ryeguy