Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do C compilers de-duplicate (merge) code?

Loop unrolling is a common optimization, but is the reverse done too?
(to reduce size of the object file output, a smaller binary).

I'm curious if it's a common technique for compilers to de-duplicate successive, identical blocks of code (or function calls) into a loop, or extract a duplicate block into a static function.

I'm interested because there are header-only libraries* in C which can add a lot of duplicate code, so it would be useful to know if some C compilers are able to detect this and handle it more efficiently.

* By header-only-library, I mean a header that defines code directly rather than function definitions.

And if this is done, it would be useful to know under what conditions & constraints apply to ensure it can be made use of.

Note (For the purpose of the question - any popular C compiler is fine GCC/Clang/Intel/MSVC).

A header-only library I found, called uthash uses some very large macros, and I wanted to know if there was some compiler trickery going on that could cleverly de-duplicate such huge blocks of code, see: eg uthash.h, another similar example is inline qsort.h


Example of a block that could be de-duplicated (it turns out Py_DECREF can expand into a fairly large block of code).

#define PY_ADD_TO_DICT(dict, i)  \
    do {
        PyDict_SetItemString(dict, names[i], item = PyUnicode_FromString(values[i])); \
        Py_DECREF(item); \
    } while (0)

/* this could be made into a loop */
PY_ADD_TO_DICT(d, 0);
PY_ADD_TO_DICT(d, 1);
PY_ADD_TO_DICT(d, 2);
PY_ADD_TO_DICT(d, 3);

Note, this is contrived, but based on a real example.


Clarification

It seems the short answer to my question is no (or only in some limited / trivial cases), just to clarify why I was asking a bit further.

Some replies in comments seem to assume anyone you would just refactor code into a function.
This is almost-always the best option of course, nevertheless there are times when blocks of very similar code can show up.

  • boiler plate code created by a not-very-smart code generator.
  • when using an external API which exposes some of its functionality as macros (in this case wrapping locally in functions works in most cases of course, but this means your code gets its own quirks which aren't typical for uses of those API's).
  • when you can't replace macros with functions, there are some rare cases where it just ends up being impractical to do this.
  • when importing code from an external code-base, its not always ideal to go in and start cleaning up their code, when evaluating that code-base its useful to have an idea how smart the compiler will be at optimizing the code.

In all these cases its possible to de-duplicate (generate smarter code, wrap macros in functions, patch 3rd party libraries), but before going making an effort to do such things its worth knowing how much work the compiler is doing for us.

like image 772
ideasman42 Avatar asked Aug 08 '14 01:08

ideasman42


3 Answers

Depending on the toolchain, you may have options to coach the compiler and linker into recognizing and coalescing redundant code. Some good google keywords include:

  • "code factoring" and additional keywords
  • "whole program optimization"
  • "interprocedural optimization" Wikipedia
  • "link time optimization" LTO

Note that the gcc optimizations page mentioned in previous comments provides some flags of interest, namely:

  • -ftree-tail-merge
  • -ftree-switch-conversion
  • -fgcse
  • -fcrossjumping
  • -fipa-pta
  • -fipa-icf (identical code folding), added in GCC5.x
  • -fopa-cp
  • -flto
  • -fwhole-program

Finally, these blog posts are informative:

  • http://hubicka.blogspot.com/2014/04/linktime-optimization-in-gcc-1-brief.html
  • http://hubicka.blogspot.com/2014/04/linktime-optimization-in-gcc-2-firefox.html
like image 172
Throwback1986 Avatar answered Nov 15 '22 22:11

Throwback1986


What might be the easiest way to describe this is referred to as refactoring code. This is the sort of thing you could do by hand as the developer, but this is not a feature of modern C compilers and optimizers, at least as far as GCC is conncerned. Included below is a list of all the individual optimizations you can set (via -O0 and -O2, respectively) in GCC, for example, and none of them refactor a series of similar statements to use a common index in a loop.

 Optimization Level Zero              Optimization Level Two
 ============================================================================================================
 -fauto-inc-dec                       -fthread-jumps 
 -fbranch-count-reg                   -falign-functions  -falign-jumps 
 -fcombine-stack-adjustments          -falign-loops  -falign-labels 
 -fcompare-elim                       -fcaller-saves 
 -fcprop-registers                    -fcrossjumping 
 -fdce                                -fcse-follow-jumps  -fcse-skip-blocks 
 -fdefer-pop                          -fdelete-null-pointer-checks 
 -fdelayed-branch                     -fdevirtualize -fdevirtualize-speculatively 
 -fdse                                -fexpensive-optimizations 
 -fforward-propagate                  -fgcse  -fgcse-lm  
 -fguess-branch-probability           -fhoist-adjacent-loads 
 -fif-conversion2                     -finline-small-functions 
 -fif-conversion                      -findirect-inlining 
 -finline-functions-called-once       -fipa-cp 
 -fipa-pure-const                     -fipa-sra 
 -fipa-profile                        -fisolate-erroneous-paths-dereference 
 -fipa-reference                      -foptimize-sibling-calls 
 -fmerge-constants                    -foptimize-strlen 
 -fmove-loop-invariants               -fpartial-inlining 
 -fshrink-wrap                        -fpeephole2 
 -fsplit-wide-types                   -freorder-blocks -freorder-blocks-and-partition -freorder-functions 
 -ftree-bit-ccp                       -frerun-cse-after-loop  
 -ftree-ccp                           -fsched-interblock  -fsched-spec 
 -fssa-phiopt                         -fschedule-insns  -fschedule-insns2 
 -ftree-ch                            -fstrict-aliasing -fstrict-overflow 
 -ftree-copy-prop                     -ftree-builtin-call-dce 
 -ftree-copyrename                    -ftree-switch-conversion -ftree-tail-merge 
 -ftree-dce                           -ftree-pre 
 -ftree-dominator-opts                -ftree-vrp 
 -ftree-dse                           -fuse-caller-save
 -ftree-forwprop                
 -ftree-fre                     
 -ftree-phiprop                 
 -ftree-sink                    
 -ftree-slsr                    
 -ftree-sra                     
 -ftree-pta                     
 -ftree-ter                     
 -funit-at-a-time               

References


  1. 3.10 - Options that control optimization, Accessed 2014-07-07, <https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html>
like image 44
Cloud Avatar answered Nov 15 '22 22:11

Cloud


Although it seems to be very doable, I have never seen any optimization which will take two similar pieces of code and combine them into a single loop.

However, there are two cases of duplication which are removed by the compiler (most obviously, because there's also a performance gain, not just a code size gain):

  1. When two expression evaluate to the same value, in the same function, a technique called common subexpression elimination removes the redundant computations.
    Interestingly, with aggressive inlining and link-time optimization more and more cases could be covered by this - but of course inlining has its own severe binary size cost.

  2. When it's possible to vectorize the code, a vectorizing compiler can often identify that and do the vectorization for you.

Other than that, you'll have to manually refactor the code to remove duplication.

like image 3
Oak Avatar answered Nov 15 '22 21:11

Oak