I have the following (simplified) code which I'd like to optimise for speed:
long inputLen = 50000000; // 50 million
DataTable dataTable = new DataTable();
DataRow dataRow;
object[] objectRow;
while (inputLen--)
{
objectRow[0] = ...
objectRow[1] = ...
objectRow[2] = ...
// Generate output for this input
output = ...
for (int i = 0; i < outputLen; i++) // outputLen can range from 1 to 20,000
{
objectRow[3] = output[i];
dataRow = dataTable.NewRow();
dataRow.ItemArray = objectRow;
dataTable.Rows.Add(dataRow);
}
}
// Bulk copy
SqlBulkCopy bulkTask = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null);
bulkTask.DestinationTableName = "newTable";
bulkTask.BatchSize = dataTable.Rows.Count;
bulkTask.WriteToServer(dataTable);
bulkTask.Close();
I'm already using SQLBulkCopy in an attempt to speed things up, but it appears assigning values to the DataTable itself proves to be slow.
I don't know how DataTables work so I'm wondering if I'm creating unnecessary overhead by first creating a reusable array, then assigning it to a DataRow, then adding the DataRow to the DataTable? Or is using DataTable not optimal in the first place? The input comes from a database.
I don't care much about LOC, just speed. Can anyone give some advice on this?
For such a big table, you should instead use the
public void WriteToServer(IDataReader reader)
method.
It may mean you'll have to implement yourself a "fake" IDataReader
interface with your code (if you' don't get the data from an existing IDataReader
), but this way, you'll get "streaming" from end to end, and will avoid a 200 million loop.
Instead of holding a huge data table in memory, I would suggest implementing a IDataReader
which serves up the data as the bulk copy goes. This will reduce the need to keep everything in memory upfront, and should thus serve to improve performance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With