Let's say I have this two matrices:
double[][] a = new double[2][2]
a[0][0] = 1
a[0][1] = 2
a[1][0] = 3
a[1][1] = 4
double[][] b = new double[2][2]
b[0][0] = 1
b[0][1] = 2
b[1][0] = 3
b[1][1] = 4
in the traditional way, to sum this matrices I would do a nested for loop:
int rows = a.length;
int cols = a[0].length;
double[][] res = new double[rows][cols];
for(int i = 0; i < rows; i++){
for(int j = 0; j < cols; j++){
res[i][j] = a[i][j] + b[i][j];
}
}
I'm fairly new to the stream API but I think this is a great fit to use with parallelStream
so my question is if there's a way to do this and take advantage of parallel processing?
Edit: not sure if this is the right place but here we go: Using some suggestions I putted the Stream to the test. The set up was like this: Classical approach:
public class ClassicMatrix {
private final double[][] components;
private final int cols;
private final int rows;
public ClassicMatrix(final double[][] components){
this.components = components;
this.rows = components.length;
this.cols = components[0].length;
}
public ClassicMatrix addComponents(final ClassicMatrix a) {
final double[][] res = new double[rows][cols];
for (int i = 0; i < rows; i++) {
for (int j = 0; j < rows; j++) {
res[i][j] = components[i][j] + a.components[i][j];
}
}
return new ClassicMatrix(res);
}
}
Using @dkatzel suggestion:
public class MatrixStream1 {
private final double[][] components;
private final int cols;
private final int rows;
public MatrixStream1(final double[][] components){
this.components = components;
this.rows = components.length;
this.cols = components[0].length;
}
public MatrixStream1 addComponents(final MatrixStream1 a) {
final double[][] res = new double[rows][cols];
IntStream.range(0, rows*cols).parallel().forEach(i -> {
int x = i/rows;
int y = i%rows;
res[x][y] = components[x][y] + a.components[x][y];
});
return new MatrixStream1(res);
}
}
Using @Eugene suggestion:
public class MatrixStream2 {
private final double[][] components;
private final int cols;
private final int rows;
public MatrixStream2(final double[][] components) {
this.components = components;
this.rows = components.length;
this.cols = components[0].length;
}
public MatrixStream2 addComponents(final MatrixStream2 a) {
final double[][] res = new double[rows][cols];
IntStream.range(0, rows)
.forEach(i -> Arrays.parallelSetAll(res[i], j -> components[i][j] * a.components[i][j]));
return new MatrixStream2(res);
}
}
and a test class, running 3 independent times one for each method (just replacing the method name in main()):
public class MatrixTest {
private final static String path = "/media/manuel/workspace/data/";
public static void main(String[] args) {
final List<Double[]> lst = new ArrayList<>();
for (int i = 100; i < 8000; i = i + 400) {
final Double[] d = testClassic(i);
System.out.println(d[0] + " : " + d[1]);
lst.add(d);
}
IOUtils.saveToFile(path + "classic.csv", lst);
}
public static Double[] testClassic(final int i) {
final ClassicMatrix a = new ClassicMatrix(rand(i));
final ClassicMatrix b = new ClassicMatrix(rand(i));
final long start = System.currentTimeMillis();
final ClassicMatrix mul = a.addComponents(b);
final long now = System.currentTimeMillis();
final double elapsed = (now - start);
return new Double[] { (double) i, elapsed };
}
public static Double[] testStream1(final int i) {
final MatrixStream1 a = new MatrixStream1(rand(i));
final MatrixStream1 b = new MatrixStream1(rand(i));
final long start = System.currentTimeMillis();
final MatrixStream1 mul = a.addComponents(b);
final long now = System.currentTimeMillis();
final double elapsed = (now - start);
return new Double[] { (double) i, elapsed };
}
public static Double[] testStream2(final int i) {
final MatrixStream2 a = new MatrixStream2(rand(i));
final MatrixStream2 b = new MatrixStream2(rand(i));
final long start = System.currentTimeMillis();
final MatrixStream2 mul = a.addComponents(b);
final long now = System.currentTimeMillis();
final double elapsed = (now - start);
return new Double[] { (double) i, elapsed };
}
private static double[][] rand(final int size) {
final double[][] rnd = new double[size][size];
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
rnd[i][j] = Math.random();
}
}
return rnd;
}
}
The results:
Classic Matrix size, Time (ms)
100.0,1.0
500.0,5.0
900.0,5.0
1300.0,43.0
1700.0,94.0
2100.0,26.0
2500.0,33.0
2900.0,46.0
3300.0,265.0
3700.0,71.0
4100.0,87.0
4500.0,380.0
4900.0,432.0
5300.0,215.0
5700.0,238.0
6100.0,577.0
6500.0,677.0
6900.0,609.0
7300.0,584.0
7700.0,592.0
Stream1, Time(ms)
100.0,86.0
500.0,13.0
900.0,9.0
1300.0,47.0
1700.0,92.0
2100.0,29.0
2500.0,33.0
2900.0,46.0
3300.0,253.0
3700.0,71.0
4100.0,90.0
4500.0,352.0
4900.0,373.0
5300.0,497.0
5700.0,485.0
6100.0,579.0
6500.0,711.0
6900.0,800.0
7300.0,780.0
7700.0,902.0
Stream2, Time(ms)
100.0,111.0
500.0,42.0
900.0,12.0
1300.0,54.0
1700.0,97.0
2100.0,110.0
2500.0,177.0
2900.0,71.0
3300.0,250.0
3700.0,106.0
4100.0,359.0
4500.0,143.0
4900.0,233.0
5300.0,261.0
5700.0,289.0
6100.0,406.0
6500.0,814.0
6900.0,830.0
7300.0,828.0
7700.0,911.0
I made a plot for better comparison:
There's no improvement at all. Where's the flaw? Are the matrices to small (7700 x 7700)? Greater than this it blows up my computer memory.
One way to do it would be by using Arrays.parallelSetAll
:
int rows = a.length;
int cols = a[0].length;
double[][] res = new double[rows][cols];
Arrays.parallelSetAll(res, i -> {
Arrays.parallelSetAll(res[i], j -> a[i][j] + b[i][j]);
return res[i];
});
I'm not 100% sure, but I think the inner call to Arrays.parallelSetAll
might not be worth the overhead of generating inner parallelization for each row's columns. Maybe it's just enough to parallelize the sum for each row only:
Arrays.parallelSetAll(res, i -> {
Arrays.setAll(res[i], j -> a[i][j] + b[i][j]);
return res[i];
});
Anyway, you should measure carefully before adding parallelization to an algorithm, because many times the overhead is so big that it's not worth using it.
This is yet to measured (I will a bit later), but shouldn't the already build in Arrays.parallelSetAll
do the job in a fastest way?
for (int i = 0; i < a.length; ++i) {
int j = i;
Arrays.parallelSetAll(r[j], x -> a[j][x] + b[j][x]);
}
Or even nicer:
IntStream.range(0, a.length)
.forEach(i -> Arrays.parallelSetAll(r[i], j -> a[i][j] + b[i][j]));
This plays very nice with CPU caches as well, since the probability that the next entry is in the same cache line is big. Doing the read in the reverse order (columns and rows), will disperse the reads all over the place.
I've put a jmh test here. Notice that Federico's answer is the fastest. Go up vote his idea.
Here are the results:
Benchmark (howManyEntries) Mode Cnt Score Error Units
DoubleArraySum.dkatzel 100 avgt 10 0.055 ± 0.005 ms/op
DoubleArraySum.dkatzel 500 avgt 10 0.997 ± 0.156 ms/op
DoubleArraySum.dkatzel 1000 avgt 10 4.162 ± 0.368 ms/op
DoubleArraySum.dkatzel 3000 avgt 10 39.619 ± 4.391 ms/op
DoubleArraySum.dkatzel 8000 avgt 10 236.468 ± 41.599 ms/op
DoubleArraySum.eugene 100 avgt 10 0.671 ± 0.187 ms/op
DoubleArraySum.eugene 500 avgt 10 6.317 ± 0.268 ms/op
DoubleArraySum.eugene 1000 avgt 10 14.751 ± 0.676 ms/op
DoubleArraySum.eugene 3000 avgt 10 65.174 ± 6.044 ms/op
DoubleArraySum.eugene 8000 avgt 10 285.571 ± 23.206 ms/op
DoubleArraySum.federico1 100 avgt 10 0.169 ± 0.010 ms/op
DoubleArraySum.federico1 500 avgt 10 1.999 ± 0.217 ms/op
DoubleArraySum.federico1 1000 avgt 10 6.087 ± 1.108 ms/op
DoubleArraySum.federico1 3000 avgt 10 40.825 ± 4.853 ms/op
DoubleArraySum.federico1 8000 avgt 10 267.446 ± 37.490 ms/op
DoubleArraySum.federico2 100 avgt 10 0.034 ± 0.003 ms/op
DoubleArraySum.federico2 500 avgt 10 0.974 ± 0.152 ms/op
DoubleArraySum.federico2 1000 avgt 10 3.245 ± 0.080 ms/op
DoubleArraySum.federico2 3000 avgt 10 30.503 ± 5.960 ms/op
DoubleArraySum.federico2 8000 avgt 10 183.183 ± 21.861 ms/op
DoubleArraySum.holijava 100 avgt 10 0.063 ± 0.002 ms/op
DoubleArraySum.holijava 500 avgt 10 1.112 ± 0.020 ms/op
DoubleArraySum.holijava 1000 avgt 10 4.138 ± 0.062 ms/op
DoubleArraySum.holijava 3000 avgt 10 41.784 ± 1.029 ms/op
DoubleArraySum.holijava 8000 avgt 10 266.590 ± 4.080 ms/op
DoubleArraySum.pivovarit 100 avgt 10 0.112 ± 0.002 ms/op
DoubleArraySum.pivovarit 500 avgt 10 2.427 ± 0.075 ms/op
DoubleArraySum.pivovarit 1000 avgt 10 9.572 ± 0.355 ms/op
DoubleArraySum.pivovarit 3000 avgt 10 84.413 ± 2.197 ms/op
DoubleArraySum.pivovarit 8000 avgt 10 690.942 ± 34.993 ms/op
EDIT
here is a more readable output (federico wins with all inputs)
100=[federico2, dkatzel, holijava, pivovarit, federico1, eugene]
500=[federico2, dkatzel, holijava, federico1, pivovarit, eugene]
1000=[federico2, holijava, dkatzel, federico1, pivovarit, eugene]
3000=[federico2, dkatzel, federico1, holijava, eugene, pivovarit]
8000=[federico2, dkatzel, holijava, federico1, eugene, pivovarit]
The only option I see here is to more/less generate all possible pairs of indexes and then extract elements and apply summation. Using parallel streams will not have any additional positive effect here with such small example but you can surely utilize Stream API here(and convert to parallel if needed immediately), although the result is not that nice as expected:
IntStream.range(0, a.length).boxed()
.flatMap(i -> IntStream.range(0, a[0].length)
.mapToObj(j -> new AbstractMap.SimpleImmutableEntry<>(i, j)))
.parallel()
.forEach(e -> {
res[e.getKey()][e.getValue()]
= a[e.getKey()][e.getValue()] + b[e.getKey()][e.getValue()];
});
We need to introduce a middleman(middlepair?) so that we can parallelize one Stream
and not play with parallelized nested Streams
.
Another advanced way would be to implement your own custom collector but it will still involve nested looping at some point.
The true power of Stream API can be observed when trying to sum all values from two arrays:
Stream.concat(Arrays.stream(a), Arrays.stream(b)).parallel()
.flatMapToDouble(Arrays::stream)
.sum();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With