As we all know, in software dev, we can be asked very ambitious things to do with technology.
Recently I was asked about the quickest possible way to convert 4000 documents from word to pdf. The code/software to do the conversion is in place, and it runs on a dedicated server, so the hardware is also there (this is a recurring task). But from a C# performance perspective, what is the best way to do this?
I keep thinking along the lines of breaking this up into chunks (ie 40 documents) and convert them (i.e. 40 unique documents x 1000 parellel tasks), which run at the same time. Is this the right idea, performance wise? The simplest (and longest) is a serial loop that goes through each doc.
What would you recommend? There are no language constraints so C# 4.0, LINQ etc is all available.
1000 parallel tasks? You want to run 1,000 threads concurrently? You'll spend more time thread switching than doing actual work. If you have a quad-core machine, you should run four threads, each of which is converting a single document at a time.
Probably the best way to start is to use a simple Parallel.ForEach
, and let the runtime library worry about scheduling the tasks. Something like:
List<string> DocumentsToConvert = new List<string>();
// here, load the file names of all the documents you want to convert.
// Then, process them with:
Parallel.Foreach(DocumentsToConvert, (doc) => { ConvertDocument(doc); });
You could do the same type of thing with the TPL and tasks:
foreach (var doc in DocumentsToConvert)
{
// Create and start a task to convert that document
}
In either case, you let the runtime library figure out how many tasks to execute in parallel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With