I have a Sql Query which returns me over half million rows to process... The process doesn't take really long, but I would like to speed it up a little bit with some multiprocessing. Considering the code below, is it possible to multithread something like that easily?
using (SqlDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
// ...process row
}
}
It would be perfect if I could simply get a cursor at the beginning and in the middle of the list of results. That way, I could have two thread processing the records. However the SqlDataReader doesn't allow me to do that...
Any idea how I could achieve that?
SqlDataReader is the fastest way. Make sure you use the get by ordinal methods rather than get by column name. e.g. GetString(1);
All Db2 database system applications are multithreaded by default, and are capable of using multiple contexts. You can use the following Db2 APIs to use multiple contexts.
It is used to populate an array of objects with the column values of the current row. It is used to get the next result, when reading the results of SQL statements. It is used to read record from the SQL Server database. To create a SqlDataReader instance, we must call the ExecuteReader method of the SqlCommand object.
The SqlDataReader is used to read a row of record at a time which is got using SqlCommand. It is read only, which means we can only read the record; it can not be edited. And also it is forward only, which means you can not go back to a previous row (record).
Set up a producer/consumer queue, with one producer process to pull from the reader and queue records as fast as it can, but do no "processing". Then some other number of processes (how many you want depends on your system) to dequeue and process each queued record.
You shouldn't read that many rows on the client.
That being said, you can partition your query into multiple queries and execute them in parallel. That means launch multiple SqlCommands in separate threads and have them each churn a partition of the result. The A+ question is how to partition the result, and this depends largely o your data and your query:
ID betweem 1 and 10000
, ID between 10001 and 20000
etc)RecordTypeID IN (1,2)
, RecordTypeID IN (3,4)
etc)ROW_NUMBER() BETWEEN 1 and 1000
etC), but this is very problematic to pull of right BINARY_CHECKSUM(*)%10 == 0
, BINARY_CHECKSUM(*)%10==1
etc)You just have to be very careful that the partition queries do no overlap and block during execution (ie. scan the same records and acquire X locks), thus serializing each other.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With