Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select n random rows from SQL Server table

I've got a SQL Server table with about 50,000 rows in it. I want to select about 5,000 of those rows at random. I've thought of a complicated way, creating a temp table with a "random number" column, copying my table into that, looping through the temp table and updating each row with RAND(), and then selecting from that table where the random number column < 0.1. I'm looking for a simpler way to do it, in a single statement if possible.

This article suggest using the NEWID() function. That looks promising, but I can't see how I could reliably select a certain percentage of rows.

Anybody ever do this before? Any ideas?

like image 475
John M Gant Avatar asked May 11 '09 16:05

John M Gant


People also ask

How do I select N random rows in SQL?

To get a single row randomly, we can use the LIMIT Clause and set to only one row. ORDER BY clause in the query is used to order the row(s) randomly. It is exactly the same as MYSQL. Just replace RAND( ) with RANDOM( ).

How do I select random 10 rows in SQL?

The function RAND() generates a random value for each row in the table. The ORDER BY clause sorts all rows in the table by the random number generated by the RAND() function. The LIMIT clause picks the first row in the result set sorted randomly.

What is Rand function in SQL Server?

RAND() function : This function in SQL Server is used to return a random decimal value and this value lies in the range greater than and equal to zero (>=0) and less than 1. If we want to obtain a random integer R in the range i <= R < j, we have to use the expression “FLOOR(i + RAND() * (j − i))”.

What is Tablesample in SQL Server?

Introduced in SQL Server 2005, TABLESAMPLE allows you to extract a sampling of rows from a table in the FROM clause. The rows retrieved are random and they are not in any order. This sampling can be based on a percentage of number of rows.


1 Answers

select top 10 percent * from [yourtable] order by newid() 

In response to the "pure trash" comment concerning large tables: you could do it like this to improve performance.

select  * from [yourtable] where [yourPk] in  (select top 10 percent [yourPk] from [yourtable] order by newid()) 

The cost of this will be the key scan of values plus the join cost, which on a large table with a small percentage selection should be reasonable.

like image 104
Ralph Shillington Avatar answered Oct 14 '22 09:10

Ralph Shillington