I have an array x[] containing data. Also there is an array of "system states" c[]. The process:
for(i = 1; i < N; i++)
{
a = f1(x[i] + c[i-1]);
b = f2(x[i] + c[i-1]);
c[i] = a + b;
}
Is there any efficient way to find the values of f1
and f2
on 2-core system using 2 parallel threads? I mean the following (in pseudo-code):
thread_1
{
for(i = 1; i < N; i++)
a = f1(x[i] + c[i-1]);
}
thread_2
{
for(i = 1; i < N; i++)
{
b = f2(x[i] + c[i-1]);
c[i] = a + b; //here we somehow get a{i} from thread_1
}
}
f1
and f2
are not time consumptive, but have to be calculated many times, so desired speedup is about x2. See the diagram for graphical representation:
Looking for code examples for Windows.
Synchronization is the use of language or library mechanisms to constrain the ordering (interleaving) of instructions performed by separate threads, to preclude orderings that lead to incorrect or undesired results.
Parallel programming is the process of using a set of resources to solve a problem in less time by dividing the work. Using parallel programming in C is important to increase the performance of the software.
C++17 added support for parallel algorithms to the standard library, to help programs take advantage of parallel execution for improved performance.
If I understand you right,
a[i]
can only be calculated when c[i-1]
is availableb[i]
can only be calculated when c[i-1]
is availablec[i]
is only available when a[i]
and b[i]
are calculatedIt means that the only process which you can do separately is calculating a[i]
and b[i]
.
That's how I see it in C#:
for (int i = 1; i < N; i++)
{
Task<double> calcA = Task.Factory.StartNew(() => { return f1(x[i] + c[i-1]); });
Task<double> calcB = Task.Factory.StartNew(() => { return f2(x[i] + c[i-1]); });
// .Result will block the execution and wait for both calculations to complete
c[i] = calcA.Result + calcB.Result;
}
This will run two separate threads, which will calculate f1
and f2
respectively. After both f1
and f2
are calculated, it will set c[i]
value, and run the next iteration.
Note that:
double
, assuming that your f1
and f2
return double
a[0]
and b[0]
values. Otherwise, c[i-1]
would throw an exception f1
and f2
is really resource-consuming and long, compared to other calculations Task.Factory.StartNew
(unlike using Thread
) uses ThreadPool which means that it doesn't create a new thread every time, but reuses the existing from the pool. It noticably reduces the overhead.The only parallel part in this algorithm is calculation of f1 and f2, but you say that f1 and f2 are not time consumptive, so it might be much better to use SIMD vectorization (e.g. System.Numerics.Vectors in C#) and run it on one core (that also reduce cache misses). Or probably you could modify your algorithm to be parallelizeable (but it might require hard work).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With