Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

quick random row selection in Postgres

I have a table in postgres that contains couple of millions of rows. I have checked on the internet and I found the following

SELECT myid FROM mytable ORDER BY RANDOM() LIMIT 1;

It works, but it's really slow... is there another way to make that query, or a direct way to select a random row without reading all the table? By the way 'myid' is an integer but it can be an empty field.

like image 580
Juan Avatar asked Mar 14 '11 10:03

Juan


People also ask

How do I select a random row in PostgreSQL?

There's an easy way to show a random record in a table: SELECT * FROM table_name ORDER BY RANDOM() LIMIT 1; But this query might take a while to finish as it reads the whole table first then take out a random record from it.

How do I randomly select rows from a table?

To get a single row randomly, we can use the LIMIT Clause and set to only one row. ORDER BY clause in the query is used to order the row(s) randomly. It is exactly the same as MYSQL. Just replace RAND( ) with RANDOM( ).


2 Answers

You might want to experiment with OFFSET, as in

SELECT myid FROM mytable OFFSET floor(random() * N) LIMIT 1;

The N is the number of rows in mytable. You may need to first do a SELECT COUNT(*) to figure out the value of N.

Update (by Antony Hatchkins)

You must use floor here:

SELECT myid FROM mytable OFFSET floor(random() * N) LIMIT 1;

Consider a table of 2 rows; random()*N generates 0 <= x < 2 and for example SELECT myid FROM mytable OFFSET 1.7 LIMIT 1; returns 0 rows because of implicit rounding to nearest int.

like image 105
NPE Avatar answered Oct 09 '22 23:10

NPE


PostgreSQL 9.5 introduced a new approach for much faster sample selection: TABLESAMPLE

The syntax is

SELECT * FROM my_table TABLESAMPLE BERNOULLI(percentage);
SELECT * FROM my_table TABLESAMPLE SYSTEM(percentage);

This is not the optimal solution if you want only one row selected, because you need to know the COUNT of the table to calculate the exact percentage.

To avoid a slow COUNT and use fast TABLESAMPLE for tables from 1 row to billions of rows, you can do:

 SELECT * FROM my_table TABLESAMPLE SYSTEM(0.000001) LIMIT 1;
 -- if you got no result:
 SELECT * FROM my_table TABLESAMPLE SYSTEM(0.00001) LIMIT 1;
 -- if you got no result:
 SELECT * FROM my_table TABLESAMPLE SYSTEM(0.0001) LIMIT 1;
 -- if you got no result:
 SELECT * FROM my_table TABLESAMPLE SYSTEM(0.001) LIMIT 1;
 ...

This might not look so elegant, but probably is faster than any of the other answers.

To decide whether you want to use BERNULLI oder SYSTEM, read about the difference at http://blog.2ndquadrant.com/tablesample-in-postgresql-9-5-2/

like image 73
alfonx Avatar answered Oct 10 '22 00:10

alfonx