I have a large table with 100,000,000 rows. I'd like to select every n'th row from the table. My first instinct is to use something like this:
SELECT id,name FROM table WHERE id%125000=0
to retrieve a even spread of 800 rows (id is a clustered index)
This technique works fine on smaller data sets but with my larger table the query takes 2.5 minutes. I assume this is because the modulus operation is applied to every row. Is there a more optimal method of row skipping ?
Your query assumes that the IDs are contiguous (and probably they aren't without you realizing this...). Anyway, you should generate the IDs yourself:
select *
from T
where ID in (0, 250000*1, 250000*2, ...)
Maybe you need a TVP to send all IDs because there are so many. Or, you produce the IDs on the server in T-SQL or a SQLCLR function or a numbers table.
This technique allows you to perform index seeks and will be the fastest you can possibly produce. It reads the minimal amount of data possible.
Modulo is not SARGable. SQL Server could support this if Microsoft wanted it, but this is an exotic use case. They will never make modulo SARGable and they shouldn't.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With