Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to insert into a SQL Server table from .NET code?

What is the fastest way to do this:

  • One table, no references that I cannot prefill (i.e. there is one reference key there, but i have all the data filled in)
  • LOTS of data. We talk of hundreds of millions of rows per day, coming in dynamically through an API
  • Requests must / should be processed as soon as feasible in a near real time scenario (i.e. no writing out to a file for upload one per day). 2 seconds is the normal maximal delay
  • Separate machines for data / application and the SQL Server

What I do now:

  • Aggregate up to 32*1024 rows into an array, then queue it.
  • Read the queue in 2-3 threads. Insert into database using SqlBulkCopy.

I get about 60k-75k rows imported per second, which is not enough, but quite close. I would love to hit 250.000 rows.

So far nothing is really used. I get 20% time "network I/O" blocks, have one core 80% loaded CPU side. Discs are writing out 7mb-14mb, mostly idle. Average queue length on a RAID 10 of 6 raptors is.... 0.25.

Anyone any idea how to speed this up? Faster server (so far it is virtual, 8gb ram, 4 cores, physical disc pass through for data).


Adding some clarifications:

  • This is a 2008 R2 Enterprise SQL Server on a 2008 R2 server. machine has 4 cores, 8gb ram. All 64 bit. The 80% load average comes from this machine showing about 20% cpu load.
  • The table is simple, has no primary key, only an index on a relational reference (instrument reference) and a unique (within a set of instruments, so this is not enforced) timestamp.
  • The fields on the table are: timestamp, instrument reference (no enforced foreign key), data type (char 1, one of a number of characters indicating what data is posted), price (double) and volume (int). As you can see this is a VERY thin table. The data in question is tick data for financial instruments.
  • The question is also about hardware etc. - mostly because i see no real bottleneck. I am inserting in multiple transactions and it gives me a benefit, but a small one. Discs, CPU are not showing significant load, network io wait is high (300ms/second, 30% at the moment), but this is on the same virtualization platform which runs JSUT the two servers and has enough cores to run all. I pretty much am open to "buy another server", but i want to identify the bottleneck first.... especially given that at the end of the day I am not grabbing what the bottleneck is. Logging is irrelevant - the bulk inserts do NOT go into the data log as data (no clustered index).

Would vertical partitioning help, for example by a byte (tinyint) that would split the instrument universe by for example 16 tables, and me thus doing up to 16 inserts at the same time? As actually the data comes from different exchanges I could make a partition per exchange. This would be a natural split field (which is actually in instrument, but I could duplicate this data here).


Some more clarifications: Got the speed even higher (90k), now clearly limited by network IO between machines, which could be VM switching.

What I do now is do a connection per 32k rows, put up a temp table, insert into this with SqlBUlkdCopy, THEN use ONE sql statement to copy to main table - minimizes any lock times on the main table.

Most waiting time is now still on network IO. Seems I run into issues where VM wise. Will move to physical hardware in the next months ;)

like image 550
TomTom Avatar asked Jan 20 '11 12:01

TomTom


2 Answers

If you manage 70k rows per second, you're very lucky so far. But I suspect it's because you have a very simple schema.

I can't believe you ask about this kind of load on

  • virtual server
  • single array
  • SATA disks

The network and CPUs are shared, IO is restricted: you can't use all resources. Any load stats you see are not very useful. I suspect the network load you see is traffic between the 2 virtual servers and you'll become IO bound if you resolve this

Before I go on, read this 10 lessons from 35K tps. He wasn't using a virtual box.

Here is what I'd do, assuming no SAN and no DR capability if you want to ramp up volumes.

  • Buy 2 big phyical servers, CPU RAM kind of irreleveant, max RAM, go x64 install
  • Disks + controllers = fastest spindles, fastest SCSI. Or a stonking great NAS
  • 1000MB + NICs
  • RAID 10 with 6-10 disk for one log file for your database only
  • Remaining disk RAID 5 or RAID 10 for data file

For reference, our peak load is 12 million rows per hour (16 core, 16GB, SAN, x64) but we have complexity in the load. We are not at capacity.

like image 169
gbn Avatar answered Sep 22 '22 03:09

gbn


Are there any indexes on the table that you could do without? EDIT: asking while you were typing.

Is it possible to turn the price into an integer, and then divide by 1000 or whatever on queries?

like image 22
Tim Avatar answered Sep 20 '22 03:09

Tim