I'm loading large amounts of data from a text file into SQL Server. Currently each record is inserted (or updated) in a separate transaction, but this leaves the DB in a bad state if a record fails.
I'd like to put it all in one big transaction. In my case, I'm looking at ~250,000 inserts or updates and maybe ~1,000,000 queries. The text file is roughly 60MB.
Is it unreasonable to put the entire operation into one transaction? What's the limiting factor?
By default, SQL Server allows a maximum of 32767 concurrent connections which is the maximum number of users that can simultaneously log in to the SQL server instance.
SQL Server Standard Edition has an upper limit of 524 Petabytes, but it is not free. If your database reaches the limit of your SQL Server Express version, you will begin to experience errors due to the inability of the database tables to accept new data.
It's not only not unreasonable to do so, but it's a must in case you want to preserve integrity in case any record fails, so you get an "all or nothing" import as you note. 250000 inserts or updates will be no problem for SQL to handle, but I would take a look at what are those million queries. If they're not needed to perform the data modification, I would take them out of the transaction, so they don't slow down the whole process.
You have to consider that when you have an open transaction (regardless of size), looks will occur at the tables it touches, and lengthy transactions like yours might cause blocking in other users that are trying to read them at the same time. If you expect the import to be big and time-consuming and the system will be under load, consider doing the whole process over the night (or any non-peak hours) to mitigate the effect.
About the size, there is no specific size limit in SQL Server, they can theoretically modify any amount of data without problems. The practical limit is really the size of the transaction log file of the target database. The DB engine stores all the temporary and modified data in this file while the transaction is in progress (so it can use it to roll it back if needed), so this file will grow in size. It must have enough free space in the DB properties, and enough HD space for the file to grow. Also, the row or table locks that the engine will put on the affected tables consumes memory, so the server must have enough free memory for all this plumbing too. Anyway, 60MB in size is often too little to worry about generally. 250,000 rows is considerable, but not that much too, so any decent-sized server will be able to handle it.
SQL Server can handle those size transactions. We use a single transaction to bulk load several million records.
The most expensive part of a database operation is usually the client server connection and traffic. For inserts/updates indexing and logging are also expensive, but you can mitigate those costs by using the correct loading techniques(see below). You really want to limit the amount of connections and data transfered between client and server.
To that end, you should consider bulk loading the data using SSIS or C# with SqlBulkCopy. Once you bulk load everything then you can use set based operations ON THE SERVER to update or verify your data.
Take a look at this question for more suggestions about optimizing data loads. The question is related to C# but a lot of the information is useful for SSIS or other loading methods. What's the fastest way to bulk insert a lot of data in SQL Server (C# client) .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With