Efficiently randomize (shuffle) data in Sql Server table

Question

I have a table with data which I have to randomize. By randomizing, I mean use data from random row to update another row in that same column. Problem is that the table itself is big (more than 2 000 000 rows).

I wrote a piece of code which uses while loop, but it's going slow.

Does anyone have any suggestion about more efficient way of achieving randomization?

Mitch Wheat · Accepted Answer

In order to update rows, there will be significant processsing time (CPU + I/O) from the updates.

Have you measured the relative expense of randomising the rows versus performing the updates?

In all you need to do is pick random rows, here's an efficient method to pick a random sample of rows (in this case 1% of the rows)

SELECT * FROM myTable
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), pkID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)

where pkID is your primary key column.

This post might be of interest:

Randomising data

Efficiently randomize (shuffle) data in Sql Server table

Tags:

sql

sql-server

random

Milhad

1 Answers

Mitch Wheat

Recent Activity

Donate For Us

Efficiently randomize (shuffle) data in Sql Server table

Tags:

sql

sql-server

random

Milhad

1 Answers

Mitch Wheat

Related questions

Recent Activity

Donate For Us