Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

100k Rows Returned in a random order, without a SQL time out please

Ok,

I've been doing a lot of reading on returning a random row set last year, and the solution we came up with was

ORDER BY newid()

This is fine for <5k rows. But when we are getting >10-20k rows we are getting SQL time outs, the Execution planned tells me that 76% of my query cost comes from this line. and removing this line increase the speed by an order of magnitude when we have a large amount of rows.

Our users have a requirement of doing up to 100k rows at a time like this.

To give you all a bit more details.

We have a table with 2.6 million 4 digit alpha-numeric codes. We use a random set of these to gain entry into a venue. For example, if we have an event with a 5000 capacity, a random set of 5000 of these will be drawn from the table then issued to the each customer as a bar-code, then the bar-code scanning app at the door with have the same list of 5000. The reason for using a 4 digit alpha numeric code (and not a stupidly long number like a GUID) is that it easy for people to write the number down (or SMS it to a friend) and just bring the number and have it entered manually, so we don't want large amount of characters. Customers love the last bit btw.

Is there a better way than ORDER BY newid(), or is there a faster way to get 100k random rows from a table with 2.6 mil?

Oh, and we are using MS SQL 2005.

Thanks,

Jo

like image 473
Joe Avatar asked Feb 23 '10 23:02

Joe


People also ask

How do I randomize SQL results?

MySQL select random records using ORDER BY RAND() The function RAND() generates a random value for each row in the table. The ORDER BY clause sorts all rows in the table by the random number generated by the RAND() function. The LIMIT clause picks the first row in the result set sorted randomly.

How do you randomly sample rows in SQL?

To get a single row randomly, we can use the LIMIT Clause and set to only one row. ORDER BY clause in the query is used to order the row(s) randomly. It is exactly the same as MYSQL. Just replace RAND( ) with RANDOM( ).

Is newid () random?

SQL Server NewId() generates a random GUID or unique identifier which can be used to return randomized rows from a SELECT query. T-SQL developers will realize that the return list of a SQL SELECT query is sorted randomly when they place "NEWID() function in the "ORDER BY" clause of the SELECT statement.


1 Answers

There is an MSDN article entitled "Selecting Rows Randomly from a Large Table" that talks about this exact problem and shows a solution (using no sorting but instead using a WHERE clause on a generated column to filter the rows).

The reason your query is slow is that the ORDER BY clause causes the whole table to be copied into tempdb for sorting.

like image 179
adrianbanks Avatar answered Oct 27 '22 23:10

adrianbanks