I have a data transformation product, which lets one select tables in a database and transform row data in the source database to a destination database.
This is handled in the current product (java based workbench and engine) by processing 1000 rows at a time and doing it 10 threads parallely. This approach works fine on smaller data sets. But, when I have to transform huge data sets (say about X million records) at one time - this approach still works, but
I started looking for solutions and I was quick to address this by requesting a hardware "beef up" on the source/destination database server machines. This involved, say, buying a new multi-core CPU and some extra RAM. Turns out, upgrading hardware wasn't just the only concern: multiple software licenses for the database were required to be procured - thanks to multi-core processors (per core license).
So, the ball is in my court now, I will have to come up with ways to address this issue, by making changes to my product. And, here is where I need your help. At this moment, I could think of one possible approach to handling huge loads:
Approach1
This is all I could come up with for now, from an architectural perspective. Have you handled this situation before? If yes, how did you handle it? Appreciate your suggestions and help.
There are a couple of things that you could do without increasing your database license cost:
Also if you are using insert instead of bulk insert there is a massive improvement potential. The problem with normal insert is that it writes information to logs so that it is possible to rollback the transaction.
In this case I was able to help someone reduce the load time from 10 hours to 6 minutes :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With