We have a very big database WriteDB
, which store raw trading data and we use this table to fast writes. Then with sql scripts I import data from WriteDB
into ReadDB
in comparatively the same table, but extended with some extra values + relation added. Import script is like that:
TRUNCATE TABLE [ReadDB].[dbo].[Price]
GO
INSERT INTO [ReadDB].[dbo].[Price]
SELECT a.*, 0 as ValueUSD, 0 as ValueEUR
from [WriteDB].[dbo].[Price] a
JOIN [ReadDB].[dbo].[Companies] b ON a.QuoteId = b.QuoteID
So initially there is around 130 mil. rows in this table (~50GB). Each day some of them added, some of them changes, so right now we decide not over complicate logic and just re-import all data. The problem that for some reason with time this script works longer and longer, on the almost same amount of data. First run it's take ~1h, now it's already taken 3h
Also SQL Server after import work not well. After import (or during it) if I try to run different queries, even the simplest they often fail with timeout errors.
What is the reason of such bad behavior and how to fix this?
I know that an INSERT on a SQL table can be slow for any number of reasons: Existence of INSERT TRIGGERs on the table. Lots of enforced constraints that have to be checked (usually foreign keys) Page splits in the clustered index when a row is inserted in the middle of the table.
You can use the following methods to speed up inserts: If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.
Insert would be faster because in case of update you need to first search for the record that you are going to update and then perform the update.
One theory is that your first 50GB dataset has filled available memory for caching. Upon truncating the table, your cache is now effectively empty. This alternating behavior makes effective use of the cache difficult and incurs a substantial number of cache misses / increased IO time.
Consider the following sequence of events:
You could test this theory by comparing the SQL Server cache miss ratio during your first and second load operations.
Some ways to improve performance might be to:
MERGE
statement instead of dumping / loading 50GB of data at a time.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With