Are these comments about GCC's optimisations valid?

Tags:

At the bottom of Demystifying the Restrict Keyword is this curious advice:

Due to the order in which scheduling is done in GCC, it is always better to simplify expressions. Do not mix memory access with calculations. The code can be re-written as follows:

then there is an example which is essentially transforming this

velocity_x[i] += acceleration_x[i] * time_step;

into this

const float ax  = acceleration_x[i];       // Then the same follows for y, z const float vx  = velocity_x[i];           // etc for y, z const float nvx = vx + ( ax * time_step ); // etc velocity_x[i]   = nvx;                     // ...

Really? I would have thought this sort of transformation was trivial compared to other stuff optimising compilers have to do, such as lambda arguments to std::foreach and so on.

Is this just stale, silly advice? Or is there a good reason why GCC can't or won't do this? (It makes me worry about writing the above as velocity += acceleration * time_step using my Vector3f class!

820

asked Feb 26 '13 15:02

spraff

2 Answers

Edit: (I'm removing details about restrict because it deviates from the actual question being asked and is causing confusion. The OP is assuming restict is used.)

The transformation in your question is indeed trivial for an optimizing compiler, but that is not what Acton's paper is suggesting.

Here is the transformation done in the paper:

This code...

  for (size_t i=0;i<count*stride;i+=stride)   {     velocity_x[i] += acceleration_x[i] * time_step;     velocity_y[i] += acceleration_y[i] * time_step;     velocity_z[i] += acceleration_z[i] * time_step;     position_x[i] += velocity_x[i]     * time_step;     position_y[i] += velocity_y[i]     * time_step;     position_z[i] += velocity_z[i]     * time_step;   }

... was transformed into this code:

  for (size_t i=0;i<count*stride;i+=stride)   {     const float ax  = acceleration_x[i];     const float ay  = acceleration_y[i];     const float az  = acceleration_z[i];     const float vx  = velocity_x[i];     const float vy  = velocity_y[i];     const float vz  = velocity_z[i];     const float px  = position_x[i];     const float py  = position_y[i];     const float pz  = position_z[i];      const float nvx = vx + ( ax * time_step );     const float nvy = vy + ( ay * time_step );     const float nvz = vz + ( az * time_step );     const float npx = px + ( vx * time_step );     const float npy = py + ( vy * time_step );     const float npz = pz + ( vz * time_step );      velocity_x[i]   = nvx;     velocity_y[i]   = nvy;     velocity_z[i]   = nvz;     position_x[i]   = npx;     position_y[i]   = npy;     position_z[i]   = npz;   }

What is the optimization?

The optimization is not - as suggested - the separation of 1 expression into 3 expressions.

The optimization is the insertion of useful instructions between the instructions that operate on any particular piece of data.

If you follow the data moving from velocity_x[i] to vx to nvx back to velocity_x[i], the CPU is doing other work between each of those steps.

Why is this an optimization?

Modern CPUs typically have a pipelined architecture.

Since instructions are executed in phases, the CPU allows multiple instructions to be processed at the same time. However, when an instruction requires the result of another instruction that hasn't been fully executed, this pipeline is stalled. No further instructions are executed until the stalled instruction can run.

Why isn't my optimizing compiler doing this automatically?

Some do.

GCC stands out as being relatively poor with this optimization.

I disassembled both loops above using gcc 4.7 (x86-64 architecture, optimization at -O3). Similar assembly was produced, but the order of the instructions was different and the first version produced significant stalls where a single float would be loaded, changed, and stored within the span of a few instructions.

You can read a little about gcc's instruction scheduling here, or just search the web for gcc instruction scheduling to see a lot of frustrated articles about this issue.

188

answered Sep 19 '22 18:09

Drew Dormann

In my opinion, stale/silly advice. I mean that level of detail is specific to the compiler, compiler version, processor, processor version, etc. I'd stick with readability and let the compiler do its job. If someone is that worried about a possible clock cycle or two in a certain target, write some #def assembly for that target and leave the higher-level code there for other targets and reference.

answered Sep 23 '22 18:09

mark

Related questions
                            
                                In C++, is it better to cap a value using std::min or an if branch?
                            
                                Difference between pic Vs pie [duplicate]
                            
                                Force maven to look for dependency in local repository
                            
                                git pull gives error: 401 Authorization Required while accessing https://git.foo.com/bar.git
                            
                                Pass array written in template to meteor/handlebars helper
                            
                                How to set the (OAuth token) Authorization Header on an Android OKHTTPClient request
                            
                                Entity Framework 5 code-first not creating database
                            
                                C++: How can I return a negative value in main.cpp
                            
                                php image file upload and convert to base64 without saving image
                            
                                C++11 backwards compatibility
                            
                                tcpdump - resolve IP and skip resolving ports
                            
                                Why can't I change the value of String.Empty?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With