I have a C#.NET application that needs to inform anywhere from 4000 to 40,000 connected devices to perform a task all at once (or as close to simultaneous as possible).
The application works well; however, I am not satisfied with the performance. In a perfect world, as soon as I send the command I would like to see all of the devices respond simultaneously. Yet, there seems to be a delay as all the threads I have created spin up and perform the task.
I have used the .NET 4.0 ThreadPool, created my own solution using custom threads and I have even tweaked the existing ThreadPool to allow for more threads to be executed at once.
I still want better performance and that is why I am here. Any ideas? Comments? Suggestion? Thank you.
-Shaun
Let me add that the application notifies these 'connected devices' that they need to go listen for audio on a multicast address.
A dual-core hyperthreaded processor MAY be able to execute 4 threads simultaneously - depending on what the thread is doing (no contention on IO or memory access, etc). A quad-core hyperthread perhaps 8. But 40K just can't physically happen.
If you want near simultaneous, you're better off spinning up just as many threads as the computer has free cores and having each thread fire off notifications then end. You'll get rid of a bunch of context switching this way.
Or, look elsewhere. As SB recommended in the comments, use a UDP multicast to notify listening machines that they should do something.
You cannot execute 4000 threads simultaneously, let alone 40k. At best on a desktop box with hyperthreading, you might get up to 8 simultaneous processes going (this assumes quad core). Threads are pseudo-parallel, and that's not even digging into the issues of bus contention.
If you absolutely need simultaneity for 40k devices, you want some form of hardware synchronization.
It sounds like you have some control over what software runs on each device. In which case, you could look to HPC usage and architect your devices (nodes) hierarchically and/or use MPI to execute your remote processes.
For the hierarchy example: Designate say, 8 nodes as primary masters, again with 8 slave nodes, each slave can act as a master too with 8 slaves (you might need to look at an automated subscription algorithm to do this). You will have a hierarchy 6 deep to cover 40,000 nodes. Each master has a small portion of code running continually waiting for instructions to pass to slaves.
All you then do is pass the instruction to the 8 primary masters and your instruction will be propagated to the ‘cluster’ on the wire asynchronously by the masters. The instruction only has to be passed on a maximum of 5 times, and thus will be propagated v-quickly.
Alternatively (or in conjunction) you could look at MPI, which is an out-of-the-can solution. There are some established C# implementations.
The overhead of creating thousands of threads is (very) significant; I would seek an alternative solution. This sounds like a job for asynchronous IO: your computer presumably only has one network connection, so no more than one message can be sent at a time - threads cannot improve on this!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With