Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast mysql random weighted choice on big database

Tags:

sql

random

mysql

I build a website where i need to choose random-weighted record from database.

There is a snipped of code in SQL : select one row randomly, but taking into account a weight

SELECT t.*, RAND() * t.weight AS w 
FROM table t 
ORDER BY w DESC
LIMIT 1

It works fine on small sample of records.

When trying on close to 1 mln records it gets slow ( 1.3 - 1.8 second) on my local machine, and i guess that i would take even longer on even bigger sets.

How could it be optimised? Are there better ways to randomly choose weighted record?

My attempt at it would be to calculate the weights on regular basis, store them in separate table, choose random number programmaticaly and search for closest record to this number.

like image 225
Jask Avatar asked Nov 01 '22 05:11

Jask


1 Answers

You could partition the data based on weight, then randomly select a partition.

Determine the partition to use: O(n)

SELECT Weight, FLOOR(RAND()*COUNT(*)) as Target 
FROM test 
GROUP BY Weight
ORDER BY RAND()*(Weight)*count(Weight)/100 DESC
LIMIT 1;

Use Weight and Target from previous query to get result: O( Log(n) )

SELECT test.*
FROM test
WHERE Weight = $Weight
LIMIT $Target, 1

Test it:

CREATE TABLE `test` (
  `Id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `Weight` int(11) NOT NULL,
  PRIMARY KEY (`Id`),
  KEY `Weight` (`Weight`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


insert into test (Weight) ( select FLOOR(RAND()*1000) );

run 20 times to create 1 million test rows:

insert into test (Weight) select FLOOR(rand()*1000) as Weight from test;

The first query runs in O(n) because of the GROUP BY. You can get it down to log(n) running time if you maintain a second table that keeps track of the count for each weight.

On my database with 8 million rows in the test table the first query runs in (6.089 s) and the second in (0.001 s)

like image 129
Michael Benjamin Avatar answered Nov 08 '22 07:11

Michael Benjamin