I have a database with 30 million rows. The PK clustered index is a code generated GUID
.
The table is as follows:
CREATE TABLE [dbo].[events](
[imageEventGUID] [uniqueidentifier] NOT NULL,
[imageSHAID] [nvarchar](256) NOT NULL,
[queryGUID] [uniqueidentifier] NOT NULL,
[eventType] [int] NOT NULL,
[eventValue] [nvarchar](2050) NULL,
[dateOfEvent] [datetime] NOT NULL,
CONSTRAINT [PK_store_image_event] PRIMARY KEY CLUSTERED
(
[imageEventGUID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Put simply its an image search engine.
imageEventGUID
is code unique identifier, imageSHAID
is the SHA256 of the image URLqueryGUID
is a code generated FK ( excluded from the create statement for brevity )eventType
is a number assigned to what type of event it iseventValue
is usually a URI of the image e.g. "http://mywebpage.com/images/image123456789.jpg"Periodically I insert via SqlBulkCopy
(from a DataTable
) into this table using pretty standard code:
using (SqlBulkCopy bulk = new SqlBulkCopy(storeConn, SqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.KeepNulls, null))
{
bulk.DestinationTableName = "[dbo].[events]";
bulk.WriteToServer(myeventsDataTable);
}
I'm typically trying to insert between 5k and 10k rows in one bulk insert. I'm having terrible insert results from this bulk copy. I used to run this DB on a SSD (only SATA 1 connected) and it was very fast (under 500 ms). I ran out of room on the SSD so I swapped the DB to a 1TB 7200 cache spinning disk, since then completion times are over 120 seconds (120000 MS). When the bulk insert is running I can see disk activity of around 1MB/sec, low CPU usage.
I have no other indexes on this table apart from the PK.
My questions to you are:
Can you see anything obvious that I am doing wrong which would cause this?
Is it just a case of 'your spinning disk is just not fast enough for a DB this size'?
What exactly is happening on the insert of this data? Because it is the clustered index is it re-arranging data pages on disk when an insert is made? It is trying to insert GUIDS which by nature are unordered and so it is possible that this 'random insert nature' is causing the read/write header to move around a lot to different pages on the disk?
Thanks for your time.
My guess is that the main issue is your choice of clustered index. The clustered index determines the physical order or records in the table. Since your PK is a Guid (which I'm assuming are generated randomly rather than sequentially) the database has to insert each row in the proper location, which will likely be between two existing records, which may cause page splits, fragmentation, etc.
As far as why it's faster on an SSD versus a magnetic drive, I'm no expert, but it's likely that the fragmentation process is faster on the SSD due to how it organizes the data. I/O throughput will be faster, but not by that magnitude.
If you can use a numeric autoincrement primary key instead of a GUID, then bulk inserts should be MUCH faster. You can still create unique indices on the GUID column to make queries faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With