What is the fastest way to do this:
What I do now:
I get about 60k-75k rows imported per second, which is not enough, but quite close. I would love to hit 250.000 rows.
So far nothing is really used. I get 20% time "network I/O" blocks, have one core 80% loaded CPU side. Discs are writing out 7mb-14mb, mostly idle. Average queue length on a RAID 10 of 6 raptors is.... 0.25.
Anyone any idea how to speed this up? Faster server (so far it is virtual, 8gb ram, 4 cores, physical disc pass through for data).
Adding some clarifications:
Would vertical partitioning help, for example by a byte (tinyint) that would split the instrument universe by for example 16 tables, and me thus doing up to 16 inserts at the same time? As actually the data comes from different exchanges I could make a partition per exchange. This would be a natural split field (which is actually in instrument, but I could duplicate this data here).
Some more clarifications: Got the speed even higher (90k), now clearly limited by network IO between machines, which could be VM switching.
What I do now is do a connection per 32k rows, put up a temp table, insert into this with SqlBUlkdCopy, THEN use ONE sql statement to copy to main table - minimizes any lock times on the main table.
Most waiting time is now still on network IO. Seems I run into issues where VM wise. Will move to physical hardware in the next months ;)
If you manage 70k rows per second, you're very lucky so far. But I suspect it's because you have a very simple schema.
I can't believe you ask about this kind of load on
The network and CPUs are shared, IO is restricted: you can't use all resources. Any load stats you see are not very useful. I suspect the network load you see is traffic between the 2 virtual servers and you'll become IO bound if you resolve this
Before I go on, read this 10 lessons from 35K tps. He wasn't using a virtual box.
Here is what I'd do, assuming no SAN and no DR capability if you want to ramp up volumes.
For reference, our peak load is 12 million rows per hour (16 core, 16GB, SAN, x64) but we have complexity in the load. We are not at capacity.
Are there any indexes on the table that you could do without? EDIT: asking while you were typing.
Is it possible to turn the price into an integer, and then divide by 1000 or whatever on queries?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With