Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributed Computing In C#

I have a specific DLL that contains some language processing classes and methods. One of these methods gets a word as an argument and does some calculation about 3 sec and save the related result on a SQL-Server Db.

I want run this DLL Method on 900k words and this job may repeat every week. How can I easily distribute this work on multiple systems to save the time using c#?

like image 624
ARZ Avatar asked Nov 27 '22 22:11

ARZ


2 Answers

Answer in the form: Requirement -- Tool

Scheduled Runs -- Quartz.NET

Quartz allows you to run "jobs" on any given schedule. It also maintains state between runs so if for some reason the server goes down, when it comes back up it knows to begin running the job. Pretty cool stuff.

Distributed Queue -- NServiceBus

A good ServiceBus is worth it's weight in gold. Basically what you want to do is ensure that all your workers are only doing a given operation for however many operations are queued. If you ensure your operations are idempotent NServiceBus is a great way to accomplish this.

Queue -> Worker1 += Worker 2 += Worker 3 --> Local Data Storage -> Data Queue + Workers -> Remote Data Storage

Data Cache -- RavenDb or SQLite

Basically in order to ensure that the return values of the given operations are sufficiently isolated from the SQL Server you want to make sure and cache the value somewhere in a local storage system. This could be something fast and non-relational like RavenDB or something structured like SQLite. You'd then throw some identifier into another queue via NServiceBus and sync it to the SQL Server, queues are your friend! :-)

Async Operations -- Task Parallel Library and TPL DataFlow

You essentially want to ensure that none of your operations are blocking and sufficiently atomic. If you don't know about TPL already you should, it's some really powerful stuff! I hear this a lot from Java folks, but it's worth mentioning...C# is becoming a really great language for async and parallel workflows!

Also one cool thing coming out of the new Async CTP is TPL DataFlow. I haven't used it, but it seems to be right up your alley!

like image 101
Anuj Avatar answered Nov 30 '22 11:11

Anuj


Since it's existing code I would look for a way to split that list of 900k words.

Everything else would require much more changes.

like image 32
Henk Holterman Avatar answered Nov 30 '22 12:11

Henk Holterman