Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server slow inserts - SqlBulkCopy performance

I have a database with 30 million rows. The PK clustered index is a code generated GUID.

The table is as follows:

CREATE TABLE [dbo].[events](
    [imageEventGUID] [uniqueidentifier] NOT NULL,
    [imageSHAID] [nvarchar](256) NOT NULL,
    [queryGUID] [uniqueidentifier] NOT NULL,
    [eventType] [int] NOT NULL,
    [eventValue] [nvarchar](2050) NULL,
    [dateOfEvent] [datetime] NOT NULL,
 CONSTRAINT [PK_store_image_event] PRIMARY KEY CLUSTERED 
(
    [imageEventGUID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

Put simply its an image search engine.

  • imageEventGUID is code unique identifier,
  • imageSHAID is the SHA256 of the image URL
  • queryGUID is a code generated FK ( excluded from the create statement for brevity )
  • eventType is a number assigned to what type of event it is
  • eventValue is usually a URI of the image e.g. "http://mywebpage.com/images/image123456789.jpg"

Periodically I insert via SqlBulkCopy (from a DataTable) into this table using pretty standard code:

using (SqlBulkCopy bulk = new SqlBulkCopy(storeConn, SqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.KeepNulls, null))
{
    bulk.DestinationTableName = "[dbo].[events]";
    bulk.WriteToServer(myeventsDataTable);
}

I'm typically trying to insert between 5k and 10k rows in one bulk insert. I'm having terrible insert results from this bulk copy. I used to run this DB on a SSD (only SATA 1 connected) and it was very fast (under 500 ms). I ran out of room on the SSD so I swapped the DB to a 1TB 7200 cache spinning disk, since then completion times are over 120 seconds (120000 MS). When the bulk insert is running I can see disk activity of around 1MB/sec, low CPU usage.

I have no other indexes on this table apart from the PK.

My questions to you are:

Can you see anything obvious that I am doing wrong which would cause this?

Is it just a case of 'your spinning disk is just not fast enough for a DB this size'?

What exactly is happening on the insert of this data? Because it is the clustered index is it re-arranging data pages on disk when an insert is made? It is trying to insert GUIDS which by nature are unordered and so it is possible that this 'random insert nature' is causing the read/write header to move around a lot to different pages on the disk?

Thanks for your time.

like image 448
user989056 Avatar asked Feb 15 '23 15:02

user989056


1 Answers

My guess is that the main issue is your choice of clustered index. The clustered index determines the physical order or records in the table. Since your PK is a Guid (which I'm assuming are generated randomly rather than sequentially) the database has to insert each row in the proper location, which will likely be between two existing records, which may cause page splits, fragmentation, etc.

As far as why it's faster on an SSD versus a magnetic drive, I'm no expert, but it's likely that the fragmentation process is faster on the SSD due to how it organizes the data. I/O throughput will be faster, but not by that magnitude.

If you can use a numeric autoincrement primary key instead of a GUID, then bulk inserts should be MUCH faster. You can still create unique indices on the GUID column to make queries faster.

like image 124
D Stanley Avatar answered Feb 23 '23 02:02

D Stanley