Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL server modulus operator to skip to every n'th row on a large table

I have a large table with 100,000,000 rows. I'd like to select every n'th row from the table. My first instinct is to use something like this:

SELECT id,name FROM table WHERE id%125000=0

to retrieve a even spread of 800 rows (id is a clustered index)

This technique works fine on smaller data sets but with my larger table the query takes 2.5 minutes. I assume this is because the modulus operation is applied to every row. Is there a more optimal method of row skipping ?

like image 998
Brian Avatar asked Sep 03 '13 19:09

Brian


1 Answers

Your query assumes that the IDs are contiguous (and probably they aren't without you realizing this...). Anyway, you should generate the IDs yourself:

select *
from T
where ID in (0, 250000*1, 250000*2, ...)

Maybe you need a TVP to send all IDs because there are so many. Or, you produce the IDs on the server in T-SQL or a SQLCLR function or a numbers table.

This technique allows you to perform index seeks and will be the fastest you can possibly produce. It reads the minimal amount of data possible.

Modulo is not SARGable. SQL Server could support this if Microsoft wanted it, but this is an exotic use case. They will never make modulo SARGable and they shouldn't.

like image 138
usr Avatar answered Sep 28 '22 06:09

usr