Imagine I have an function which goes through one million/billion strings and checks smth in them.
f.ex:
foreach (String item in ListOfStrings)
{
result.add(CalculateSmth(item));
}
it consumes lot's of time, because CalculateSmth is very time consuming function.
I want to ask: how to integrate multithreading in this kinda process?
f.ex: I want to fire-up 5 threads and each of them returns some results, and thats goes-on till the list has items.
Maybe anyone can show some examples or articles..
Forgot to mention I need it in .NET 2.0
You could try the Parallel extensions (part of .NET 4.0)
These allow you to write something like:
Parallel.Foreach (ListOfStrings, (item) =>
result.add(CalculateSmth(item));
);
Of course result.add would need to be thread safe.
The Parallel extensions is cool, but this can also be done just by using the threadpool like this:
using System.Collections.Generic;
using System.Threading;
namespace noocyte.Threading
{
class CalcState
{
public CalcState(ManualResetEvent reset, string input) {
Reset = reset;
Input = input;
}
public ManualResetEvent Reset { get; private set; }
public string Input { get; set; }
}
class CalculateMT
{
List<string> result = new List<string>();
List<ManualResetEvent> events = new List<ManualResetEvent>();
private void Calc() {
List<string> aList = new List<string>();
aList.Add("test");
foreach (var item in aList)
{
CalcState cs = new CalcState(new ManualResetEvent(false), item);
events.Add(cs.Reset);
ThreadPool.QueueUserWorkItem(new WaitCallback(Calculate), cs);
}
WaitHandle.WaitAll(events.ToArray());
}
private void Calculate(object s)
{
CalcState cs = s as CalcState;
cs.Reset.Set();
result.Add(cs.Input);
}
}
}
Note that concurrency doesn't magically give you more resource. You need to establish what is slowing CalculateSmth down.
For example, if it's CPU-bound (and you're on a single core) then the same number of CPU ticks will go to the code, whether you execute them sequentially or in parallel. Plus you'd get some overhead from managing the threads. Same argument applies to other constraints (e.g. I/O)
You'll only get performance gains in this if CalculateSmth is leaving resource free during its execution, that could be used by another instance. That's not uncommon. For example, if the task involves IO followed by some CPU stuff, then process 1 could be doing the CPU stuff while process 2 is doing the IO. As mats points out, a chain of producer-consumer units can achieve this, if you have the infrastructure.
You need to split up the work you want to do in parallel. Here is an example of how you can split the work in two:
List<string> work = (some list with lots of strings)
// Split the work in two
List<string> odd = new List<string>();
List<string> even = new List<string>();
for (int i = 0; i < work.Count; i++)
{
if (i % 2 == 0)
{
even.Add(work[i]);
}
else
{
odd.Add(work[i]);
}
}
// Set up to worker delegates
List<Foo> oddResult = new List<Foo>();
Action oddWork = delegate { foreach (string item in odd) oddResult.Add(CalculateSmth(item)); };
List<Foo> evenResult = new List<Foo>();
Action evenWork = delegate { foreach (string item in even) evenResult.Add(CalculateSmth(item)); };
// Run two delegates asynchronously
IAsyncResult evenHandle = evenWork.BeginInvoke(null, null);
IAsyncResult oddHandle = oddWork.BeginInvoke(null, null);
// Wait for both to finish
evenWork.EndInvoke(evenHandle);
oddWork.EndInvoke(oddHandle);
// Merge the results from the two jobs
List<Foo> allResults = new List<Foo>();
allResults.AddRange(oddResult);
allResults.AddRange(evenResult);
return allResults;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With