In the book Programming Massively Parallel Processors
the number of gflops is used to compare the efficiency of different matrix multiplication kernels. How would I compute this for my own kernels on my own machine?
Somewhere in the NVIDIA Forums I found this 'algorithm', but I don't know, how valid it is or where the times two comes from.
NumOps = 2 * pow(MatrixSize,3)
gflops = 1.0e-9 * NumOps / ExecutionTime
p.s. please feel free to change the tags...
You can measure the GFLOPs by running the algorithm with a large input and measuring the execution time. Then put the execution time and matrix size into that formula. For matrix sizes big enough to keep the entire machine busy, the FLOPs is only weakly dependent on matrix size.
The GPU matrix multiplication algorithm performs the same number of floating-point operations as the naive algorithm.
for (i = 0; i < MatrixSize; i++)
for (j = 0; j < MatrixSize; j++)
for (k = 0; k < MatrixSize; k++)
C[j][i] += A[j][k] * B[k][i];
There are 2 floating-point operations in the loop body, and MatrixSize * MatrixSize * MatrixSize
iterations of the loop body, which gives you the formula for NumOps. GFLOPs is just the number of operations per second, divided by 10^9 ('giga').
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With