Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to multithread a SqlDataReader?

I have a Sql Query which returns me over half million rows to process... The process doesn't take really long, but I would like to speed it up a little bit with some multiprocessing. Considering the code below, is it possible to multithread something like that easily?

using (SqlDataReader reader = command.ExecuteReader())
{
    while (reader.Read())
    {
        // ...process row
    }
}

It would be perfect if I could simply get a cursor at the beginning and in the middle of the list of results. That way, I could have two thread processing the records. However the SqlDataReader doesn't allow me to do that...

Any idea how I could achieve that?

like image 812
Martin Avatar asked May 27 '09 13:05

Martin


People also ask

Is there anything faster than SqlDataReader in net?

SqlDataReader is the fastest way. Make sure you use the get by ordinal methods rather than get by column name. e.g. GetString(1);

Are databases multi threaded?

All Db2 database system applications are multithreaded by default, and are capable of using multiple contexts. You can use the following Db2 APIs to use multiple contexts.

Why do we use SqlDataReader?

It is used to populate an array of objects with the column values of the current row. It is used to get the next result, when reading the results of SQL statements. It is used to read record from the SQL Server database. To create a SqlDataReader instance, we must call the ExecuteReader method of the SqlCommand object.

What is SqlDataReader explain it with relevant example?

The SqlDataReader is used to read a row of record at a time which is got using SqlCommand. It is read only, which means we can only read the record; it can not be edited. And also it is forward only, which means you can not go back to a previous row (record).


2 Answers

Set up a producer/consumer queue, with one producer process to pull from the reader and queue records as fast as it can, but do no "processing". Then some other number of processes (how many you want depends on your system) to dequeue and process each queued record.

like image 164
Joel Coehoorn Avatar answered Oct 11 '22 10:10

Joel Coehoorn


You shouldn't read that many rows on the client.

That being said, you can partition your query into multiple queries and execute them in parallel. That means launch multiple SqlCommands in separate threads and have them each churn a partition of the result. The A+ question is how to partition the result, and this depends largely o your data and your query:

  1. You can use a range of keys (eg. ID betweem 1 and 10000, ID between 10001 and 20000 etc)
  2. You can use an attribute (eg. RecordTypeID IN (1,2), RecordTypeID IN (3,4) etc)
  3. You can use a synthetic range (ie. ROW_NUMBER() BETWEEN 1 and 1000 etC), but this is very problematic to pull of right
  4. You can use a hash (eg. BINARY_CHECKSUM(*)%10 == 0, BINARY_CHECKSUM(*)%10==1 etc)

You just have to be very careful that the partition queries do no overlap and block during execution (ie. scan the same records and acquire X locks), thus serializing each other.

like image 20
Remus Rusanu Avatar answered Oct 11 '22 12:10

Remus Rusanu