I have an application, in it's simplest form, it reads a large number of phone numbers from a database (about 15 million) and sends each number off one line at a time to a url for processing. I designed the application like this:
Problem is: It still takes a long time to complete. MSMQ also has a limit on the size of messages it can take and now I have to create multiple message queues. I need a lot of fault tolerance but I dare not make my message queue transactional because of performance. I'm thinking of publishing the message queue (currently a private queue) to the active directory so that the processes can dequeue it from different systems so this can complete quicker. Also, my processors hit 100% during execution and I'm changing it to use a threadpool at this time. I'm willing to explore JMS right now if it will handle the queue better. So far, the most efficient part of the whole processing is the SSIS part.
I'll like to hear better design approach, especially if you've handled this kind of volume before. I'm ready to switch to unix or do lisp if it handles this kinda situation better.
Thanks.
Here is a simple super pragmatic solution:
First split your text file into smaller files, perhaps with something like 10,000 entries in each file. Lets call them numbers_x.queue.
Create a threadpool based app where each thread processes the files using the following steps:
While this is a pretty crude approach, it is super easy to implement, pretty fault tolerant and you can easily split the .queue files between a set of servers and have them work in parallel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With