I am writing a app that reads a whole table, does some processing, then writes the resulting data to another table. I am using the SqlBulkCopy
class (.net version of "bcp in") which does the insert very fast. But I cannot find any efficent way to select data in the first place. there is not .net equivilent of "bcp out", which seems strange to me.
Currently I'm using select * from table_name
. For prespective it takes 2.5 seconds to select 6,000 rows ... and only 600ms to bulk insert the same number of rows.
I would expect that selecting data should always be faster than inserting. What is the fastest way to select all rows & columns from a table?
Answers to qeustions:
Here is my code:
DataTable staging = new DataTable();
using (SqlConnection dwConn = (SqlConnection)SqlConnectionManager.Instance.GetDefaultConnection())
{
dwConn.Open();
SqlCommand cmd = dwConn.CreateCommand();
cmd.CommandText = "select * from staging_table";
SqlDataReader reader = cmd.ExecuteReader();
staging.Load(reader);
}
select * from table_name
is the simplest, easiest and fastest way to read a whole table.
Let me explain why your results lead to wrong conclusions.
It all depends on your hardware, but it is likely that your network is the bottleneck here.
Apart from limiting your query to just read the columns you'd actually be using, doing a select is as fast as it will get. There is caching involved here, when you execute it twice in a row, the second time shoud be much faster because the data is cached in memory. execute dbcc dropcleanbuffers
to check the effect of caching.
If you want to do it as fast as possible try to implement the code that does the processing in T-SQL, that way it could operate directly on the data right there on the server.
Another good tip for speed tuning is have the table that is being read on one disk (look at filegroups) and the table that is written to on another disk. That way one disk can do a continuous read and the other a continuous write. If both operations happen on the same disk the heads of the disk keep going back and forth what seriously downgrades performance.
If the logic your writing cannot be doen it T-SQL you could also have a look at SQL CLR.
Another tip: when you do select * from table, use a datareader if possible. That way you don't materialize the whole thing in memory first.
GJ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With