I have a table with more than a millon rows. This table is used to index tiff
images. Each image has fields like date
, number
, etc. I have users that index these images in batches of 500. I need to know if it is better to first insert 500 rows and then perform 500 updates or, when the user finishes indexing, to do the 500 inserts with all the data. A very important thing is that if I do the 500 inserts at first, this time is free for me because I can do it the night before.
So the question is: is it better to do inserts or inserts and updates, and why? I have defined a id
value for each image, and I also have other indices on the fields.
Insert would be faster because in case of update you need to first search for the record that you are going to update and then perform the update.
Insert is more faster than update because in insert there's no checking of data.
For best future query performance, it's better to do an update to keep the same extents. Delete and insert will not necessarily use the same extents. For a table of that size, it would be unlikely to do so. Furthermore, delete can leave "holes" in your data.
Insertion is inserting a new key and update is updating the value of an existing key. If that is the case (a very common case) , update would be faster than insertion because update involves an indexed lookup and changing an existing value without touching the index.
Updates in Sql server result in ghosted rows - i.e. Sql crosses one row out and puts a new one in. The crossed out row is deleted later.
Both inserts and updates can cause page-splits in this way, they both effectively 'add' data, it's just that updates flag the old stuff out first.
On top of this updates need to look up the row first, which for lots of data can take longer than the update.
Inserts will just about always be quicker, especially if they are either in order or if the underlying table doesn't have a clustered index.
When inserting larger amounts of data into a table look at the current indexes - they can take a while to change and build. Adding values in the middle of an index is always slower.
You can think of it like appending to an address book: Mr Z can just be added to the last page, while you'll have to find space in the middle for Mr M.
Doing the inserts first and then the updates does seem to be a better idea for several reasons. You will be inserting at a time of low transaction volume. Since inserts have more data, this is a better time to do it.
Since you are using an id value (which is presumably indexed) for updates, the overhead of updates will be very low. You would also have less data during your updates.
You could also turn off transactions at the batch (500 inserts/updates) level and use it for each individual record, thus reducing some overhead.
Finally, test this out to see the actual performance on your server before making a final decision.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With