I have a huge table of > 10 million rows. I need to efficiently grab a random sampling of 5000 from it. I have some constriants that reduces the total rows I am looking for to like 9 millon. I tried using order by NEWID(), but that query will take too long as it has to do a table scan of all rows. Is there a faster way to do this?

If you can use a pseudo-random sampling and you're on SQL Server 2005/2008, then take a look at TABLESAMPLE. For instance, an example from SQL Server 2008 / AdventureWorks 2008 which works based on rows: <pre class="prettyprint"><code>USE AdventureWorks2008; GO SELECT FirstName, LastName FROM Person.Person TABLESAMPLE (100 ROWS) WHERE EmailPromotion = 2; </code></pre> The catch is that TABLESAMPLE isn't exactly random as it generates a given number of rows from each physical page. You may not get back exactly 5000 rows unless you limit with TOP as well. If you're on SQL Server 2000, you're going to have to either generate a temporary table which match the primary key or you're going to have to do it using a method using NEWID().

Have you looked into using the TABLESAMPLE clause? For example: <pre class="prettyprint"><code>select * from HumanResources.Department tablesample (5 percent) </code></pre>

Select random sampling from sqlserver quickly

2 Answers

If you can use a pseudo-random sampling and you're on SQL Server 2005/2008, then take a look at TABLESAMPLE. For instance, an example from SQL Server 2008 / AdventureWorks 2008 which works based on rows:

USE AdventureWorks2008; 
GO 


SELECT FirstName, LastName
FROM Person.Person 
TABLESAMPLE (100 ROWS)
WHERE EmailPromotion = 2;

The catch is that TABLESAMPLE isn't exactly random as it generates a given number of rows from each physical page. You may not get back exactly 5000 rows unless you limit with TOP as well. If you're on SQL Server 2000, you're going to have to either generate a temporary table which match the primary key or you're going to have to do it using a method using NEWID().

answered Oct 18 '22 14:10

K. Brian Kelley

Have you looked into using the TABLESAMPLE clause?

For example:

select *
from HumanResources.Department tablesample (5 percent)

answered Oct 18 '22 13:10

John Sansom

Related questions
                            
                                Trying to sum distinct values SQL
                            
                                counting the amount of rows returned with a query in laravel
                            
                                CONCAT_WS() for SQL Server
                            
                                "Incorrect syntax near 'OFFSET'" modift sql comm 2012 to 2008
                            
                                Saving enumerated values to a database
                            
                                Athena: Query exhausted resources at scale factor
                            
                                prolog to SQL converter
                            
                                What's the difference between 'not in' and 'not exists'?
                            
                                Database Design for Facebook-like messages [closed]
                            
                                comparing two strings in SQL Server
                            
                                ExecuteNonQuery returning value of 0 when successfully deleting a record
                            
                                How to List Field's Name in table in Access Using SQL
                            
                                The correct COPY command to load postgreSQL data from csv file that has single-quoted data?
                            
                                Is it possible to wrap DDL changes in a transaction in PostgreSQL?
                            
                                How to convert Rows to Columns in Oracle? [duplicate]
                            
                                SQL Server 2008 password ending in a semicolon
                            
                                GroupBy and Sum in SQLAlchemy?
                            
                                SQL "if exists..." dynamic query
                            
                                Practical limit to length of SQL query (specifically MySQL)
                            
                                Decision between storing lookup table id's or pure data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Select random sampling from sqlserver quickly

Tags:

performance

sql

database

sql-server

random

Byron Whitlock

People also ask

2 Answers

K. Brian Kelley

John Sansom

Recent Activity

Donate For Us