I have a table with data which I have to randomize. By randomizing, I mean use data from random row to update another row in that same column. Problem is that the table itself is big (more than 2 000 000 rows).
I wrote a piece of code which uses while loop, but it's going slow.
Does anyone have any suggestion about more efficient way of achieving randomization?
In order to update rows, there will be significant processsing time (CPU + I/O) from the updates.
Have you measured the relative expense of randomising the rows versus performing the updates?
In all you need to do is pick random rows, here's an efficient method to pick a random sample of rows (in this case 1% of the rows)
SELECT * FROM myTable
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), pkID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)
where pkID
is your primary key column.
This post might be of interest:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With