Optimization in GCC

Tags:

gcc

I have two questions:

(1) I learned somewhere that -O3 is not recommended with GCC, because

The -O3 optimization level may increase the speed of the resulting executable, but can also increase its size. Under some circumstances where these optimizations are not favorable, this option might actually make a program slower. in fact it should not be used system-wide with gcc 4.x. The behavior of gcc has changed significantly since version 3.x. In 3.x, -O3 has been shown to lead to marginally faster execution times over -O2, but this is no longer the case with gcc 4.x. Compiling all your packages with -O3 will result in larger binaries that require more memory, and will significantly increase the odds of compilation failure or unexpected program behavior (including errors). The downsides outweigh the benefits; remember the principle of diminishing returns. Using -O3 is not recommended for gcc 4.x.

Suppose I have a workstation (Kubuntu9.04) which has 128 GB of memory and 24 cores and is shared by many users, some of whom may run intensive programs using like 60 GB memory. Is -O2 a better choice for me than -O3?

(2) I also learned that when a running program crashes unexpectedly, any debugging information is better than none, so the use of -g is recommended for optimized programs, both for development and deployment. But when compiled with -ggdb3 together with -O2 or -O3, will it slow down the speed of execution? Assume I am still using the same workstation.

941

asked Sep 02 '09 16:09

Tim

2 Answers

The only way to know for sure is to benchmark your application compiled with -O2 and -O3. Also there are some individual optimization options that -O3 includes and you can turn on and off individually. Concerning the warning about larger binaries, note that just comparing executable file sizes compiled with -O2 and -O3 will not do much good here, because it is the size of small critical internal loops that matters here the most. You really have to benchmark.
It will result in a larger executable, but there shouldn't be any measurable slowdown.

answered Oct 15 '22 15:10

Laurynas Biveinis

Try it
You can rarely make accurate judgments about speed and optimisation without any data.

ps. This will also tell you if it's worth the effort. How many milliseconds saved in a function used once at startup is worthwhile ?

answered Oct 15 '22 15:10

Martin Beckett

Related questions
                            
                                YSlow recommendations. How necessary are they?
                            
                                Fast multiplication of k x k boolean matrices, where 8 <= k <= 16
                            
                                Generate all bit patterns for a given mask
                            
                                Optimization: splitting dataframe into a list of dataframes, transforming data per row
                            
                                What are the relative cycle times for the 6 basic arithmetic operations?
                            
                                Any smarter way to extract from array of bits?
                            
                                gcc memory alignment pragma
                            
                                does python multiplicative expression evaluates faster if finds a zero?
                            
                                Single letters for naming variables and functions
                            
                                What does Java compile an enumeration down to?
                            
                                fast numpy addnan
                            
                                Columnstore index proper usage
                            
                                JavaScript style/optimization: String.indexOf() v. Regex.test()
                            
                                Most efficient sorting algorithm for many identical keys?
                            
                                PHP: One big echo (or print) VS many small echo (or print)
                            
                                How can I improve the efficiency of this numpy loop
                            
                                What is the fastest way to compute large power of 2 modulo a number
                            
                                Is there a really working example which showing the benefits of ILP(Instruction-Level Parallelism) on x86_64?
                            
                                Reset tensorflow Optimizer
                            
                                What do you do without fast gather and scatter in AVX2 instructions?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimization in GCC

Tags:

optimization

gcc

Tim

People also ask

2 Answers

Laurynas Biveinis

Martin Beckett

Recent Activity

Donate For Us