Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does reducing the number of loops not speed up the program?

I have a program that does a lot of matrix multiplication. I thought I'd speed it up by reducing the number of loops in the code to see how much faster it would be (I'll try a matrix math library later). It turns out it's not faster at all. I've been able to replicate the problem with some example code. My guess was that testOne() would be faster than testTwo() because it doesn't create any new arrays and because it has a third as many loops. On my machine, its takes twice as long to run:

Duration for testOne with 5000 epochs: 657, loopCount: 64000000

Duration for testTwo with 5000 epochs: 365, loopCount: 192000000

My guess is that multOne() is slower than multTwo() because in multOne() the CPU is not writing to sequential memory addresses like it is in multTwo(). Does that sound right? Any explanations would be appreciated.

import java.util.Random;

public class ArrayTest {

    double[] arrayOne;
    double[] arrayTwo;
    double[] arrayThree;

    double[][] matrix;

    double[] input;
    int loopCount;

    int rows;
    int columns;

    public ArrayTest(int rows, int columns) {
        this.rows = rows;
        this.columns = columns;
        this.loopCount = 0;
        arrayOne = new double[rows];
        arrayTwo = new double[rows];
        arrayThree = new double[rows];
        matrix = new double[rows][columns];
        Random random = new Random();
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < columns; j++) {
                matrix[i][j] = random.nextDouble();
            }
        }
    }

    public void testOne(double[] input, int epochs) {
        this.input = input;
        this.loopCount = 0;
        long start = System.currentTimeMillis();
        long duration;
        for (int i = 0; i < epochs; i++) {
            multOne();
        }
        duration = System.currentTimeMillis() - start;
        System.out.println("Duration for testOne with " + epochs + " epochs: " + duration + ", loopCount: " + loopCount);
    }

    public void multOne() {
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < columns; j++) {
                arrayOne[i] += matrix[i][j] * arrayOne[i] * input[j];
                arrayTwo[i] += matrix[i][j] * arrayTwo[i] * input[j];
                arrayThree[i] += matrix[i][j] * arrayThree[i] * input[j];
                loopCount++;
            }
        }
    }

    public void testTwo(double[] input, int epochs) {

        this.loopCount = 0;
        long start = System.currentTimeMillis();
        long duration;
        for (int i = 0; i < epochs; i++) {
            arrayOne = multTwo(matrix, arrayOne, input);
            arrayTwo = multTwo(matrix, arrayTwo, input);
            arrayThree = multTwo(matrix, arrayThree, input);
        }
        duration = System.currentTimeMillis() - start;
        System.out.println("Duration for testTwo with " + epochs + " epochs: " + duration + ", loopCount: " + loopCount);
    }

    public double[] multTwo(double[][] matrix, double[] array, double[] input) {
        double[] newArray = new double[rows];
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < columns; j++) {
                newArray[i] += matrix[i][j] * array[i] * input[j];
                loopCount++;
            }
        }
        return newArray;
    }

    public static void main(String[] args) {
        int rows = 100;
        int columns = 128;
        ArrayTest arrayTest = new ArrayTest(rows, columns);
        Random random = new Random();
        double[] input = new double[columns];
        for (int i = 0; i < columns; i++) {
            input[i] = random.nextDouble();
        }
        arrayTest.testOne(input, 5000);
        arrayTest.testTwo(input, 5000);
    }
}
like image 798
KrakathorOfTheLavaPeople Avatar asked Oct 30 '22 02:10

KrakathorOfTheLavaPeople


1 Answers

There is a simple reason why your tests take different time: they don't do the same thing. Since the two loops you compare are not functionally identical, the number of iterations is not a good metric to look at.

testOne takes longer than testTwo because:

  • In multOne you update arrayOne[i] in place, during each iteration of the j loop. This means for each iteration of the inner loop j you are using a new value of arrayOne[i], computed in the previous iteration. This creates a loop carried dependency, which is harder to optimise for the compiler, because you require the output of the operation matrix[i][j] * arrayOne[i] * input[j] on the next CPU clock cycle. This is not really possible with floating point operations, which have a latency of a few clock cycles usually, so it results in stalls, therefore reduced performance.

  • In testTwo you update arrayOne only once per each iteration of the epoch, and since there are no carried dependecies, the loop can be vectorised efficiently, which results in better cache and arithmetic performance.

like image 100
paul-g Avatar answered Nov 15 '22 05:11

paul-g