Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently randomize (shuffle) data in Sql Server table

I have a table with data which I have to randomize. By randomizing, I mean use data from random row to update another row in that same column. Problem is that the table itself is big (more than 2 000 000 rows).

I wrote a piece of code which uses while loop, but it's going slow.

Does anyone have any suggestion about more efficient way of achieving randomization?

like image 365
Milhad Avatar asked Dec 09 '22 07:12

Milhad


1 Answers

In order to update rows, there will be significant processsing time (CPU + I/O) from the updates.

Have you measured the relative expense of randomising the rows versus performing the updates?

In all you need to do is pick random rows, here's an efficient method to pick a random sample of rows (in this case 1% of the rows)

SELECT * FROM myTable
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), pkID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)

where pkID is your primary key column.

This post might be of interest:

  • Randomising data
like image 55
Mitch Wheat Avatar answered Jan 28 '23 16:01

Mitch Wheat