Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.net distributed grid computing migration, recommendations on libraries, architecture [closed]

I have a c# multi-threaded monte carlo simulation, the application is already structured such that it can be partitioned by into Tasks that execute independently, a TaskController executes Tasks, aggregates intermediate results, checks for convergence (early termination criteria) then returns final results, this is currently implemented using a ThreadPool.

I'd like to leverage more than one computer to aid in this calculation. I don't have approval or infrastructure to use IIS (this is policy not going to change) but I can use for example WCF with the NetTcpBinding endpoint binding, I've tested this communication across server and it has the appropriate permissions and works.

To start I'm thinking have one master exe (console app) and several slaves on other servers as dedicated workers (should these be exes? or windows services?), eventually I could have this set to run on hundreds of workstations (as well as servers) within the company during idle time (or when a screensaver is active).

I could write this myself, but I'll have to handle communications, 1, 2-way? early termination (intermediate convergence result checking), cancelling tasks no longer required, deployment of work, discovery of available and ready Machines for deployment of work, throttling/pausing of work if a workstation is no longer idle? everything else that goes in a distributed system?

Should the master (task controller) know the addresses (ip) of all the slave workers and tell them to do work (if they are available) or should the slave workers just know the master address and request work when they are in a position to do so, or communication should flow both ways? This will run on a 24 hour clock with about 9 runs kicked off per day to support different business regions.

I am looking for recommendations for .net grid/distributed libraries that can help and some architecture advice in this endeavour.

Update

Has anyone experience using any of the following?

http://www.digipede.net/ (commercial)
http://www.gridbus.org/~alchemi/
http://ngrid.sourceforge.net/
http://www.osl.iu.edu/research/mpi.net/

or used JavaSpaces, Jini from .net or found equivalent .net technologies

http://java.sun.com/developer/technicalArticles/tools/JavaSpaces/
http://www.jini.org

Thanks

like image 963
m3ntat Avatar asked Aug 04 '09 10:08

m3ntat


1 Answers

I would investigate the possibility of using a space-based architecture for this.

The master would write the jobs into a space (essentially an object repository). The consuming clients are always looking for jobs and as jobs become available, they will pull from the space, process, and write back the results to that space, or another (all under a transaction). You would tag jobs as belonging to a particular run in order to group results.

The advantage of this is that this scales very easily (simply by adding more consumers). The consumers would have to determine when they can work, and simply need to be configured with info about the space (how to find it). The producer is decoupled completely from the set of consumers.

Because work is processed under a transaction, if a consumer fails to complete, the work returns to the space and is available for processing by another consumer.

You can handle intermediate results easily. The producer takes results from the space and can derive intermediates as results become available. You can cancel jobs easily. Simply remove them from the space.

You can add more producers very easily. They simply write to the same space, and if the jobs are tagged appropriately, results are tied to the producer unambiguously.

I'm not sure what frameworks are available for .Net, unfortunately (I'm from the Java world and would use Javaspaces - these use dynamic discovery, and next to no configuration is needed). But worth some Googling. Perhaps (if this is powerful enough), you can write the C# producer/consumers to interface to a Javaspace infrastructure.

like image 190
Brian Agnew Avatar answered Sep 29 '22 08:09

Brian Agnew