Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java sum two double[][] with parallel stream

Let's say I have this two matrices:

double[][] a = new double[2][2]
a[0][0] = 1
a[0][1] = 2
a[1][0] = 3
a[1][1] = 4

double[][] b = new double[2][2]
b[0][0] = 1
b[0][1] = 2
b[1][0] = 3
b[1][1] = 4

in the traditional way, to sum this matrices I would do a nested for loop:

int rows = a.length;
int cols = a[0].length;
double[][] res = new double[rows][cols];
for(int i = 0; i < rows; i++){
    for(int j = 0; j < cols; j++){
        res[i][j] = a[i][j] + b[i][j];
    }
}

I'm fairly new to the stream API but I think this is a great fit to use with parallelStream so my question is if there's a way to do this and take advantage of parallel processing?

Edit: not sure if this is the right place but here we go: Using some suggestions I putted the Stream to the test. The set up was like this: Classical approach:

public class ClassicMatrix {

    private final double[][] components;
    private final int cols;
    private final int rows;




    public ClassicMatrix(final double[][] components){
    this.components = components;
    this.rows = components.length;
    this.cols = components[0].length;
    }


    public ClassicMatrix addComponents(final ClassicMatrix a) {
    final double[][] res = new double[rows][cols];
    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < rows; j++) {
        res[i][j] = components[i][j] + a.components[i][j];
        }
    }
    return new ClassicMatrix(res);
    }

}

Using @dkatzel suggestion:

public class MatrixStream1 {

    private final double[][] components;
    private final int cols;
    private final int rows;

    public MatrixStream1(final double[][] components){
    this.components = components;
    this.rows = components.length;
    this.cols = components[0].length;
    }

    public MatrixStream1 addComponents(final MatrixStream1 a) {
    final double[][] res = new double[rows][cols];
    IntStream.range(0, rows*cols).parallel().forEach(i -> {
               int x = i/rows;
               int y = i%rows;

               res[x][y] = components[x][y] + a.components[x][y];
           });
    return new MatrixStream1(res);
    }
}

Using @Eugene suggestion:

public class MatrixStream2 {

    private final double[][] components;
    private final int cols;
    private final int rows;

    public MatrixStream2(final double[][] components) {
    this.components = components;
    this.rows = components.length;
    this.cols = components[0].length;
    }

    public MatrixStream2 addComponents(final MatrixStream2 a) {
    final double[][] res = new double[rows][cols];
    IntStream.range(0, rows)
        .forEach(i -> Arrays.parallelSetAll(res[i], j -> components[i][j] * a.components[i][j]));
    return new MatrixStream2(res);
    }
}

and a test class, running 3 independent times one for each method (just replacing the method name in main()):

public class MatrixTest {

    private final static String path = "/media/manuel/workspace/data/";

    public static void main(String[] args) {
    final List<Double[]> lst = new ArrayList<>();
    for (int i = 100; i < 8000; i = i + 400) {
        final Double[] d = testClassic(i); 
        System.out.println(d[0] + " : " + d[1]);
        lst.add(d);
    }
    IOUtils.saveToFile(path + "classic.csv", lst);
    }

    public static Double[] testClassic(final int i) {

    final ClassicMatrix a = new ClassicMatrix(rand(i));
    final ClassicMatrix b = new ClassicMatrix(rand(i));

    final long start = System.currentTimeMillis();
    final ClassicMatrix mul = a.addComponents(b);
    final long now = System.currentTimeMillis();
    final double elapsed = (now - start);

    return new Double[] { (double) i, elapsed };

    }

    public static Double[] testStream1(final int i) {

    final MatrixStream1 a = new MatrixStream1(rand(i));
    final MatrixStream1 b = new MatrixStream1(rand(i));

    final long start = System.currentTimeMillis();
    final MatrixStream1 mul = a.addComponents(b);
    final long now = System.currentTimeMillis();
    final double elapsed = (now - start);

    return new Double[] { (double) i, elapsed };

    }

    public static Double[] testStream2(final int i) {

    final MatrixStream2 a = new MatrixStream2(rand(i));
    final MatrixStream2 b = new MatrixStream2(rand(i));

    final long start = System.currentTimeMillis();
    final MatrixStream2 mul = a.addComponents(b);
    final long now = System.currentTimeMillis();
    final double elapsed = (now - start);

    return new Double[] { (double) i, elapsed };

    }

    private static double[][] rand(final int size) {
    final double[][] rnd = new double[size][size];
    for (int i = 0; i < size; i++) {
        for (int j = 0; j < size; j++) {
        rnd[i][j] = Math.random();
        }
    }
    return rnd;
    }
}

The results:

Classic Matrix size, Time (ms)
100.0,1.0
500.0,5.0
900.0,5.0
1300.0,43.0
1700.0,94.0
2100.0,26.0
2500.0,33.0
2900.0,46.0
3300.0,265.0
3700.0,71.0
4100.0,87.0
4500.0,380.0
4900.0,432.0
5300.0,215.0
5700.0,238.0
6100.0,577.0
6500.0,677.0
6900.0,609.0
7300.0,584.0
7700.0,592.0

Stream1, Time(ms)
100.0,86.0
500.0,13.0
900.0,9.0
1300.0,47.0
1700.0,92.0
2100.0,29.0
2500.0,33.0
2900.0,46.0
3300.0,253.0
3700.0,71.0
4100.0,90.0
4500.0,352.0
4900.0,373.0
5300.0,497.0
5700.0,485.0
6100.0,579.0
6500.0,711.0
6900.0,800.0
7300.0,780.0
7700.0,902.0

Stream2, Time(ms)
100.0,111.0
500.0,42.0
900.0,12.0
1300.0,54.0
1700.0,97.0
2100.0,110.0
2500.0,177.0
2900.0,71.0
3300.0,250.0
3700.0,106.0
4100.0,359.0
4500.0,143.0
4900.0,233.0
5300.0,261.0
5700.0,289.0
6100.0,406.0
6500.0,814.0
6900.0,830.0
7300.0,828.0
7700.0,911.0

I made a plot for better comparison: Performance Test

There's no improvement at all. Where's the flaw? Are the matrices to small (7700 x 7700)? Greater than this it blows up my computer memory.

like image 441
MLeiria Avatar asked Jul 25 '17 13:07

MLeiria


3 Answers

One way to do it would be by using Arrays.parallelSetAll:

int rows = a.length;
int cols = a[0].length;
double[][] res = new double[rows][cols];

Arrays.parallelSetAll(res, i -> {
    Arrays.parallelSetAll(res[i], j -> a[i][j] + b[i][j]);
    return res[i];
});

I'm not 100% sure, but I think the inner call to Arrays.parallelSetAll might not be worth the overhead of generating inner parallelization for each row's columns. Maybe it's just enough to parallelize the sum for each row only:

Arrays.parallelSetAll(res, i -> {
    Arrays.setAll(res[i], j -> a[i][j] + b[i][j]);
    return res[i];
});

Anyway, you should measure carefully before adding parallelization to an algorithm, because many times the overhead is so big that it's not worth using it.

like image 98
fps Avatar answered Oct 09 '22 09:10

fps


This is yet to measured (I will a bit later), but shouldn't the already build in Arrays.parallelSetAll do the job in a fastest way?

    for (int i = 0; i < a.length; ++i) {
        int j = i;
        Arrays.parallelSetAll(r[j], x -> a[j][x] + b[j][x]);
    }

Or even nicer:

IntStream.range(0, a.length)
         .forEach(i -> Arrays.parallelSetAll(r[i], j -> a[i][j] + b[i][j]));

This plays very nice with CPU caches as well, since the probability that the next entry is in the same cache line is big. Doing the read in the reverse order (columns and rows), will disperse the reads all over the place.

I've put a jmh test here. Notice that Federico's answer is the fastest. Go up vote his idea.

Here are the results:

Benchmark                 (howManyEntries)  Mode  Cnt    Score    Error  Units
DoubleArraySum.dkatzel                 100  avgt   10    0.055 ±  0.005  ms/op
DoubleArraySum.dkatzel                 500  avgt   10    0.997 ±  0.156  ms/op
DoubleArraySum.dkatzel                1000  avgt   10    4.162 ±  0.368  ms/op
DoubleArraySum.dkatzel                3000  avgt   10   39.619 ±  4.391  ms/op
DoubleArraySum.dkatzel                8000  avgt   10  236.468 ± 41.599  ms/op
DoubleArraySum.eugene                  100  avgt   10    0.671 ±  0.187  ms/op
DoubleArraySum.eugene                  500  avgt   10    6.317 ±  0.268  ms/op
DoubleArraySum.eugene                 1000  avgt   10   14.751 ±  0.676  ms/op
DoubleArraySum.eugene                 3000  avgt   10   65.174 ±  6.044  ms/op
DoubleArraySum.eugene                 8000  avgt   10  285.571 ± 23.206  ms/op
DoubleArraySum.federico1               100  avgt   10    0.169 ±  0.010  ms/op
DoubleArraySum.federico1               500  avgt   10    1.999 ±  0.217  ms/op
DoubleArraySum.federico1              1000  avgt   10    6.087 ±  1.108  ms/op
DoubleArraySum.federico1              3000  avgt   10   40.825 ±  4.853  ms/op
DoubleArraySum.federico1              8000  avgt   10  267.446 ± 37.490  ms/op
DoubleArraySum.federico2               100  avgt   10    0.034 ±  0.003  ms/op
DoubleArraySum.federico2               500  avgt   10    0.974 ±  0.152  ms/op
DoubleArraySum.federico2              1000  avgt   10    3.245 ±  0.080  ms/op
DoubleArraySum.federico2              3000  avgt   10   30.503 ±  5.960  ms/op
DoubleArraySum.federico2              8000  avgt   10  183.183 ± 21.861  ms/op
DoubleArraySum.holijava                100  avgt   10    0.063 ±  0.002  ms/op
DoubleArraySum.holijava                500  avgt   10    1.112 ±  0.020  ms/op
DoubleArraySum.holijava               1000  avgt   10    4.138 ±  0.062  ms/op
DoubleArraySum.holijava               3000  avgt   10   41.784 ±  1.029  ms/op
DoubleArraySum.holijava               8000  avgt   10  266.590 ±  4.080  ms/op
DoubleArraySum.pivovarit               100  avgt   10    0.112 ±  0.002  ms/op
DoubleArraySum.pivovarit               500  avgt   10    2.427 ±  0.075  ms/op
DoubleArraySum.pivovarit              1000  avgt   10    9.572 ±  0.355  ms/op
DoubleArraySum.pivovarit              3000  avgt   10   84.413 ±  2.197  ms/op
DoubleArraySum.pivovarit              8000  avgt   10  690.942 ± 34.993  ms/op

EDIT

here is a more readable output (federico wins with all inputs)

100=[federico2, dkatzel, holijava, pivovarit, federico1, eugene]
500=[federico2, dkatzel, holijava, federico1, pivovarit, eugene]
1000=[federico2, holijava, dkatzel, federico1, pivovarit, eugene]
3000=[federico2, dkatzel, federico1, holijava, eugene, pivovarit]
8000=[federico2, dkatzel, holijava, federico1, eugene, pivovarit]
like image 40
Eugene Avatar answered Oct 09 '22 09:10

Eugene


The only option I see here is to more/less generate all possible pairs of indexes and then extract elements and apply summation. Using parallel streams will not have any additional positive effect here with such small example but you can surely utilize Stream API here(and convert to parallel if needed immediately), although the result is not that nice as expected:

IntStream.range(0, a.length).boxed()
      .flatMap(i -> IntStream.range(0, a[0].length)
        .mapToObj(j -> new AbstractMap.SimpleImmutableEntry<>(i, j)))
      .parallel()
      .forEach(e -> {
          res[e.getKey()][e.getValue()]
            = a[e.getKey()][e.getValue()] + b[e.getKey()][e.getValue()];
      });

We need to introduce a middleman(middlepair?) so that we can parallelize one Stream and not play with parallelized nested Streams.

Another advanced way would be to implement your own custom collector but it will still involve nested looping at some point.


The true power of Stream API can be observed when trying to sum all values from two arrays:

Stream.concat(Arrays.stream(a), Arrays.stream(b)).parallel()
      .flatMapToDouble(Arrays::stream)
      .sum();
like image 6
Grzegorz Piwowarek Avatar answered Oct 09 '22 10:10

Grzegorz Piwowarek