Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random record from a database table (T-SQL)

Is there a succinct way to retrieve a random record from a sql server table?

I would like to randomize my unit test data, so am looking for a simple way to select a random id from a table. In English, the select would be "Select one id from the table where the id is a random number between the lowest id in the table and the highest id in the table."

I can't figure out a way to do it without have to run the query, test for a null value, then re-run if null.

Ideas?

like image 999
Jeremy Avatar asked Oct 10 '08 13:10

Jeremy


People also ask

How can I get random data from a table in SQL?

To get a single row randomly, we can use the LIMIT Clause and set to only one row. ORDER BY clause in the query is used to order the row(s) randomly. It is exactly the same as MYSQL. Just replace RAND( ) with RANDOM( ).

Can SQL generate random records in database?

In SQL Server, it is quite easy to do this thanks to the NEWID() system function. The NEWID() system function creates a unique value of type uniqueidentifier. There's no need to add a new column to your table just to have the ability of randomly selecting records from your table.

How do I shuffle a record in SQL?

Whenever we need to sort a given SQL query result set, we have to use the ORDER BY clause. However, to randomize the returned rows, we need the ORDER BY clause to use a function or database object that returns a random value for each row contained in the SQL result set.


2 Answers

Is there a succinct way to retrieve a random record from a sql server table?

Yes

SELECT TOP 1 * FROM table ORDER BY NEWID() 

Explanation

A NEWID() is generated for each row and the table is then sorted by it. The first record is returned (i.e. the record with the "lowest" GUID).

Notes

  1. GUIDs are generated as pseudo-random numbers since version four:

    The version 4 UUID is meant for generating UUIDs from truly-random or pseudo-random numbers.

    The algorithm is as follows:

    • Set the two most significant bits (bits 6 and 7) of the clock_seq_hi_and_reserved to zero and one, respectively.
    • Set the four most significant bits (bits 12 through 15) of the time_hi_and_version field to the 4-bit version number from Section 4.1.3.
    • Set all the other bits to randomly (or pseudo-randomly) chosen values.

    —A Universally Unique IDentifier (UUID) URN Namespace - RFC 4122

  2. The alternative SELECT TOP 1 * FROM table ORDER BY RAND() will not work as one would think. RAND() returns one single value per query, thus all rows will share the same value.

  3. While GUID values are pseudo-random, you will need a better PRNG for the more demanding applications.

  4. Typical performance is less than 10 seconds for around 1,000,000 rows — of course depending on the system. Note that it's impossible to hit an index, thus performance will be relatively limited.

like image 54
Sklivvz Avatar answered Sep 18 '22 21:09

Sklivvz


On larger tables you can also use TABLESAMPLE for this to avoid scanning the whole table.

SELECT  TOP 1 * FROM YourTable TABLESAMPLE (1000 ROWS) ORDER BY NEWID() 

The ORDER BY NEWID is still required to avoid just returning rows that appear first on the data page.

The number to use needs to be chosen carefully for the size and definition of table and you might consider retry logic if no row is returned. The maths behind this and why the technique is not suited to small tables is discussed here

like image 36
Martin Smith Avatar answered Sep 20 '22 21:09

Martin Smith