Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select random rows with seeding

Tags:

c#

sql-server

Using SQL Server, I have a table with around 5.5 million rows and I want to randomly select a set of maybe 120 rows that meet some criteria.

That's some what related to Select n random rows from SQL Server table and https://msdn.microsoft.com/en-us/library/cc441928.aspx, but my problem is that I want to be able to seed this so I can randomly pick the same 120 rows consistently and then get a different, random set of rows if I use a different seed.

I could do something like this in my application:

var rand = new Random(seed);
var allExamples = db.myTable.Where(/*some condition*/).ToList();
var subSet = db.myTable.Select(x => new { x, r = rand.NextDouble())
    .OrderBy(x => x.r)
    .Take(120)
    .Select(x => x.x).ToList();

Which works, but, as you might guess, with 5.5 million rows is glacially slow. So I'm really looking for a way to make this work on the SQL server side so I don't have to retrieve and process all the rows.

like image 724
Matt Burland Avatar asked Dec 23 '15 16:12

Matt Burland


1 Answers

If you want something that looks random then mix your [PrimaryKey] with some other data...

SELECT *
FROM [your table]
ORDER BY
    CHECKSUM([primarykey]) ^ CHECKSUM('your seed') 

... this will still be a table scan but it should have better performance then pulling the entire set of data do your client just to throw away everything except 120 rows.

like image 170
Matthew Whited Avatar answered Sep 21 '22 23:09

Matthew Whited