Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.
A large variety of optimizations are provided by GCC. Most are categorized into one of three levels, but some are provided at multiple levels. Some optimizations reduce the size of the resulting machine code, while others try to create code that is faster, potentially increasing its size.
Use the command-line option -O0 (-[capital o][zero]) to disable optimization, and -S to get assembly file. Look here to see more gcc command-line options.
In some cases, one knows at compile time what a particular piece of algorithmic data looks like, and as such might wish to convey this information to the compiler. This question is about how one might best achieve that.
By way of example, consider the following example of a sparse matrix multiplication in which the matrix is constant and known at compile time:
matrix = [ 0, 210, 0, 248, 137]
[ 0, 0, 0, 0, 239]
[ 0, 0, 0, 0, 0]
[116, 112, 0, 0, 7]
[ 0, 0, 0, 0, 165]
In such a case, a fully branchless implementation could be written to implement the matrix vector multiplication for an arbitrary input vector:
#include <stdio.h>
#define ARRAY_SIZE 8
static const int matrix[ARRAY_SIZE] = {210, 248, 137, 239, 116, 112, 7, 165};
static const int input_indices[ARRAY_SIZE] = {1, 3, 4, 4, 0, 1, 4, 4};
static const int output_indices[ARRAY_SIZE] = {0, 0, 0, 1, 3, 3, 3, 4};
static void matrix_multiply(int *input_array, int *output_array)
{
for (int i=0; i<ARRAY_SIZE; ++i){
output_array[output_indices[i]] += (
matrix[i] * input_array[input_indices[i]]);
}
}
int main()
{
int test_input[5] = {36, 220, 212, 122, 39};
int output[5] = {0};
matrix_multiply(test_input, output);
for (int i=0; i<5; ++i){
printf("%d\n", output[i]);
}
}
which prints the correct result for the matrix-vector multiplication (81799, 9321, 0, 29089, 6435
).
Further optimisations can be envisaged that build on data specific knowledge about the memory locality of reference.
Now, clearly this is an approach which can be used, but it starts getting unwieldy when the size of the data gets big (say ~100MB in my case) and also in any real world situation would depend on meta-programming to generate the associated data dependent knowledge.
Does the general strategy of baking in data specific knowledge have mileage as regards optimisation? If so, what is the best approach to do this?
In the example given, on one level the whole thing than be reduced to knowledge about ARRAY_SIZE
with the arrays set at runtime. This leads me to think the approach is limited (and is really a data structures problem), but I'm very interested to know if the general approach of data derived compile-time optimisations is useful in any situation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With