Loop unrolling is a common optimization, but is the reverse done too?
(to reduce size of the object file output, a smaller binary).
I'm curious if it's a common technique for compilers to de-duplicate successive, identical blocks of code (or function calls) into a loop, or extract a duplicate block into a static function.
I'm interested because there are header-only libraries*
in C which can add a lot of duplicate code, so it would be useful to know if some C compilers are able to detect this and handle it more efficiently.
*
By header-only-library, I mean a header that defines code directly rather than function definitions.
And if this is done, it would be useful to know under what conditions & constraints apply to ensure it can be made use of.
Note (For the purpose of the question - any popular C compiler is fine GCC/Clang/Intel/MSVC).
A header-only library I found, called uthash uses some very large macros, and I wanted to know if there was some compiler trickery going on that could cleverly de-duplicate such huge blocks of code, see: eg uthash.h, another similar example is inline qsort.h
Example of a block that could be de-duplicated (it turns out Py_DECREF
can expand into a fairly large block of code).
#define PY_ADD_TO_DICT(dict, i) \
do {
PyDict_SetItemString(dict, names[i], item = PyUnicode_FromString(values[i])); \
Py_DECREF(item); \
} while (0)
/* this could be made into a loop */
PY_ADD_TO_DICT(d, 0);
PY_ADD_TO_DICT(d, 1);
PY_ADD_TO_DICT(d, 2);
PY_ADD_TO_DICT(d, 3);
Note, this is contrived, but based on a real example.
It seems the short answer to my question is no (or only in some limited / trivial cases), just to clarify why I was asking a bit further.
Some replies in comments seem to assume anyone you would just refactor code into a function.
This is almost-always the best option of course, nevertheless there are times when blocks of very similar code can show up.
In all these cases its possible to de-duplicate (generate smarter code, wrap macros in functions, patch 3rd party libraries), but before going making an effort to do such things its worth knowing how much work the compiler is doing for us.
Depending on the toolchain, you may have options to coach the compiler and linker into recognizing and coalescing redundant code. Some good google keywords include:
Note that the gcc optimizations page mentioned in previous comments provides some flags of interest, namely:
Finally, these blog posts are informative:
What might be the easiest way to describe this is referred to as refactoring code. This is the sort of thing you could do by hand as the developer, but this is not a feature of modern C compilers and optimizers, at least as far as GCC is conncerned. Included below is a list of all the individual optimizations you can set (via -O0
and -O2
, respectively) in GCC, for example, and none of them refactor a series of similar statements to use a common index in a loop.
Optimization Level Zero Optimization Level Two
============================================================================================================
-fauto-inc-dec -fthread-jumps
-fbranch-count-reg -falign-functions -falign-jumps
-fcombine-stack-adjustments -falign-loops -falign-labels
-fcompare-elim -fcaller-saves
-fcprop-registers -fcrossjumping
-fdce -fcse-follow-jumps -fcse-skip-blocks
-fdefer-pop -fdelete-null-pointer-checks
-fdelayed-branch -fdevirtualize -fdevirtualize-speculatively
-fdse -fexpensive-optimizations
-fforward-propagate -fgcse -fgcse-lm
-fguess-branch-probability -fhoist-adjacent-loads
-fif-conversion2 -finline-small-functions
-fif-conversion -findirect-inlining
-finline-functions-called-once -fipa-cp
-fipa-pure-const -fipa-sra
-fipa-profile -fisolate-erroneous-paths-dereference
-fipa-reference -foptimize-sibling-calls
-fmerge-constants -foptimize-strlen
-fmove-loop-invariants -fpartial-inlining
-fshrink-wrap -fpeephole2
-fsplit-wide-types -freorder-blocks -freorder-blocks-and-partition -freorder-functions
-ftree-bit-ccp -frerun-cse-after-loop
-ftree-ccp -fsched-interblock -fsched-spec
-fssa-phiopt -fschedule-insns -fschedule-insns2
-ftree-ch -fstrict-aliasing -fstrict-overflow
-ftree-copy-prop -ftree-builtin-call-dce
-ftree-copyrename -ftree-switch-conversion -ftree-tail-merge
-ftree-dce -ftree-pre
-ftree-dominator-opts -ftree-vrp
-ftree-dse -fuse-caller-save
-ftree-forwprop
-ftree-fre
-ftree-phiprop
-ftree-sink
-ftree-slsr
-ftree-sra
-ftree-pta
-ftree-ter
-funit-at-a-time
References
<https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html>
Although it seems to be very doable, I have never seen any optimization which will take two similar pieces of code and combine them into a single loop.
However, there are two cases of duplication which are removed by the compiler (most obviously, because there's also a performance gain, not just a code size gain):
When two expression evaluate to the same value, in the same function, a technique called common subexpression elimination removes the redundant computations.
Interestingly, with aggressive inlining and link-time optimization more and more cases could be covered by this - but of course inlining has its own severe binary size cost.
When it's possible to vectorize the code, a vectorizing compiler can often identify that and do the vectorization for you.
Other than that, you'll have to manually refactor the code to remove duplication.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With