What I want
I want to work on optimization of fork/join algorithm. By optimization I mean just calculation of optimal number of threads, or if you want - calculation of SEQUENTIAL_THRESHOLD
(see code below).
// PSEUDOCODE
Result solve(Problem problem) {
if (problem.size < SEQUENTIAL_THRESHOLD)
return solveSequentially(problem);
else {
Result left, right;
INVOKE-IN-PARALLEL {
left = solve(extractLeftHalf(problem));
right = solve(extractRightHalf(problem));
}
return combine(left, right);
}
}
How do I imagine that
For example, I want to calculate the product of big array. Then I just evaluate all components and get the optimal threads amount:
SEQUENTIAL_THRESHOLD = PC * IS / MC
(just example)
PC
- number of processor cores;
IS
- constant, that indicates the optimal array size with one processor core and the simplest operation on data (for example reading);
MC
- multiply operation cost;
Suppose MC = 15; PC = 4 and IS = 10000; SEQUENTIAL_THRESHOLD = 2667
. Than if subtask-array is bigger than 2667 I'll fork it.
Broad questions
Narrow question:
Do already exist some investigations about calculation of SEQUENTIAL_THRESHOLD
for arrays/collections/sorting? How do they accomplish that?
Updated 07 March 2014:
The fork/join framework is an implementation of the ExecutorService interface that helps you take advantage of multiple processors. It is designed for work that can be broken into smaller pieces recursively. The goal is to use all the available processing power to enhance the performance of your application.
Using the fork/join framework can speed up processing of large tasks, but to achieve this outcome, we should follow some guidelines: Use as few thread pools as possible. In most cases, the best decision is to use one thread pool per application or system.
In ExecutorService you can set a number of threads according to the IO capacity of your system instead of the CPU capacity of your system. If you want to call an IO intensive operation from a ForkJoinTask then you should create a class that implements ForkJoinPool.
Implementation notes: This implementation restricts the maximum number of running threads to 32767. Attempts to create pools with greater than the maximum number result in IllegalArgumentException .
There is absolutely, positively no way to calculate a proper threshold unless you are intimate with the execution environment. I maintain a fork/join project on sourceforge.net and this is the code I use in most built-in-functions:
private int calcThreshold(int nbr_elements, int passed_threshold) {
// total threads in session
// total elements in array
int threads = getNbrThreads();
int count = nbr_elements + 1;
// When only one thread, it doesn't pay to decompose the work,
// force the threshold over array length
if (threads == 1) return count;
/*
* Whatever it takes
*
*/
int threshold = passed_threshold;
// When caller suggests a value
if (threshold > 0) {
// just go with the caller's suggestion or do something with the suggestion
} else {
// do something usful such as using about 8 times as many tasks as threads or
// the default of 32k
int temp = count / (threads << 3);
threshold = (temp < 32768) ? 32768 : temp;
} // endif
// whatever
return threshold;
}
Edit on 9 March:
How can you possibly have a general utility that can know not only the processor speed, memory available, number of processors, etc. (the physical environment) but the intention of the software? The answer is you cannot. Which is why you need to develop a routine for each environment. The above method is what I use for basic arrays (vectors.) I use another for most matrix processing:
// When very small, just spread every row
if (count < 6) return 1;
// When small, spread a little
if (count < 30) return ((count / (threads << 2) == 0)? threads : (count / (threads << 2)));
// this works well for now
return ((count / (threads << 3) == 0)? threads : (count / (threads << 3)));
As far as Java8 streams: They use the F/J framework under the hood and you cannot specify a threshold.
You cannot boil this down to a simple formula for several reasons:
Each PC will have vastly different parameters depending not only on the core, but also on other factors like RAM timing or background tasks.
Java itself is optimizing loops on the fly during execution. So a momentary perfect setting could be sub-optimal a few seconds later. Or worse: the adjustment could prevent perfect optimization all together.
The only way to go that I can see is to dynamically adjust the values in some form of AI or genetic algorithm. However that includes that the program frequently checks non-optimal settings just to determine whether or not the current setting is still the best. So it is questionable if the speed gained is actually higher than the speed lost for trying other settings. In the end probably only a solution during an initial learning phase, while further executions then use those trained values as fixed numbers.
As this not only costs time but also greatly increases the code complexity, I don't think this is an option for most programs. Often it is more beneficial to not even use Fork-Join in the first place, as there are many other parallelization options that might better suit the problem.
An idea for a "genetic" algorithm would be to measure the loop efficiency each run, then have a background hash-map loop-parameters -> execution time
that is constantly updated, and the fastest setting is selected for most of the runs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With