I have a C++ project with many targets that include a lot of boost header files and other line-intensive headers. Most of the targets include the same headers. Thus, I thought this might be ideal to use precompiled headers (pch). So I created a header file with the most included headers and precompiled it.
This reduced the lines of code of the compilation unit from 350k to 120k (I passed the -save-temps
flag to the gcc to check that). I checked that it was used with the -H
parameter and the pch has a exlamation mark in front of it.
The precompiled header has 550MB.
Though, the compile time was only reduced from 23 seconds to 20 seconds.
Is this little of improvement to be expected from precompiled headers? If not, what am I doing wrong? What speeds the compilation time with precompiled headers most?
Edit: This is the gcc command:
/usr/bin/c++
-fPIC -I/projectDir/build/source -I/projectDir/source -I/usr/include/eigen3 -include /projectDir/build/source/Core/core/cotire/Core_ORIGINAL_CXX_prefix.hxx -Winvalid-pch -g -Wall -Wextra -Wno-long-long -Wno-unused-parameter -std=c++0x -DBOOST_ENABLE_ASSERT_HANDLER -D_REENTRANT -o CMakeFiles/SubProject.dir/cotire/SubProject_ORIGINAL_CXX_unity.cxx.o -c /projectDir/build/source/ArmarXCore/statechart/cotire/SubProject_ORIGINAL_CXX_unity.cxx
The output of passing -ftime-report
gives me (with PCH enabled):
Execution times (seconds)
phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1321 kB ( 0%) ggc
phase parsing : 7.29 (32%) usr 1.69 (51%) sys 8.99 (35%) wall 1135793 kB (54%) ggc
phase lang. deferred : 2.75 (12%) usr 0.40 (12%) sys 3.15 (12%) wall 317920 kB (15%) ggc
phase opt and generate : 12.03 (53%) usr 1.17 (36%) sys 13.22 (51%) wall 622545 kB (30%) ggc
phase check & debug info: 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 440 kB ( 0%) ggc
phase last asm : 0.63 ( 3%) usr 0.02 ( 1%) sys 0.64 ( 2%) wall 26440 kB ( 1%) ggc
phase finalize : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.02 ( 0%) wall 0 kB ( 0%) ggc
|name lookup : 1.30 ( 6%) usr 0.29 ( 9%) sys 1.42 ( 5%) wall 153617 kB ( 7%) ggc
|overload resolution : 3.37 (15%) usr 0.59 (18%) sys 3.30 (13%) wall 360551 kB (17%) ggc
garbage collection : 1.80 ( 8%) usr 0.01 ( 0%) sys 1.82 ( 7%) wall 0 kB ( 0%) ggc
dump files : 0.11 ( 0%) usr 0.05 ( 2%) sys 0.18 ( 1%) wall 0 kB ( 0%) ggc
callgraph construction : 0.44 ( 2%) usr 0.10 ( 3%) sys 0.59 ( 2%) wall 26388 kB ( 1%) ggc
callgraph optimization : 0.21 ( 1%) usr 0.11 ( 3%) sys 0.23 ( 1%) wall 16131 kB ( 1%) ggc
ipa free inline summary : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
cfg construction : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 2119 kB ( 0%) ggc
cfg cleanup : 0.08 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall 169 kB ( 0%) ggc
trivially dead code : 0.05 ( 0%) usr 0.02 ( 1%) sys 0.13 ( 0%) wall 0 kB ( 0%) ggc
df scan insns : 0.30 ( 1%) usr 0.02 ( 1%) sys 0.38 ( 1%) wall 1126 kB ( 0%) ggc
df live regs : 0.07 ( 0%) usr 0.00 ( 0%) sys 0.10 ( 0%) wall 0 kB ( 0%) ggc
df reg dead/unused notes: 0.10 ( 0%) usr 0.03 ( 1%) sys 0.12 ( 0%) wall 7774 kB ( 0%) ggc
register information : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 0 kB ( 0%) ggc
alias analysis : 0.02 ( 0%) usr 0.02 ( 1%) sys 0.08 ( 0%) wall 2621 kB ( 0%) ggc
rebuild jump labels : 0.05 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc
preprocessing : 1.16 ( 5%) usr 0.45 (14%) sys 1.61 ( 6%) wall 209848 kB (10%) ggc
parser (global) : 0.43 ( 2%) usr 0.29 ( 9%) sys 0.83 ( 3%) wall 193966 kB ( 9%) ggc
parser struct body : 1.03 ( 5%) usr 0.20 ( 6%) sys 1.37 ( 5%) wall 199825 kB ( 9%) ggc
parser enumerator list : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 574 kB ( 0%) ggc
parser function body : 0.53 ( 2%) usr 0.06 ( 2%) sys 0.49 ( 2%) wall 35252 kB ( 2%) ggc
parser inl. func. body : 0.13 ( 1%) usr 0.03 ( 1%) sys 0.14 ( 1%) wall 11720 kB ( 1%) ggc
parser inl. meth. body : 1.14 ( 5%) usr 0.19 ( 6%) sys 1.45 ( 6%) wall 115776 kB ( 6%) ggc
template instantiation : 4.11 (18%) usr 0.82 (25%) sys 4.78 (18%) wall 566245 kB (27%) ggc
inline parameters : 0.05 ( 0%) usr 0.01 ( 0%) sys 0.03 ( 0%) wall 12792 kB ( 1%) ggc
tree gimplify : 0.28 ( 1%) usr 0.03 ( 1%) sys 0.27 ( 1%) wall 55239 kB ( 3%) ggc
tree eh : 0.19 ( 1%) usr 0.00 ( 0%) sys 0.14 ( 1%) wall 20091 kB ( 1%) ggc
tree CFG construction : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 34452 kB ( 2%) ggc
tree CFG cleanup : 0.09 ( 0%) usr 0.02 ( 1%) sys 0.15 ( 1%) wall 27 kB ( 0%) ggc
tree PHI insertion : 0.01 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 5960 kB ( 0%) ggc
tree SSA rewrite : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.04 ( 0%) wall 8035 kB ( 0%) ggc
tree SSA other : 0.04 ( 0%) usr 0.03 ( 1%) sys 0.12 ( 0%) wall 1604 kB ( 0%) ggc
tree operand scan : 0.06 ( 0%) usr 0.04 ( 1%) sys 0.08 ( 0%) wall 16681 kB ( 1%) ggc
dominance frontiers : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
dominance computation : 0.14 ( 1%) usr 0.04 ( 1%) sys 0.12 ( 0%) wall 0 kB ( 0%) ggc
out of ssa : 0.04 ( 0%) usr 0.03 ( 1%) sys 0.14 ( 1%) wall 8 kB ( 0%) ggc
expand vars : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.14 ( 1%) wall 10387 kB ( 0%) ggc
expand : 0.79 ( 3%) usr 0.05 ( 2%) sys 0.77 ( 3%) wall 89756 kB ( 4%) ggc
post expand cleanups : 0.10 ( 0%) usr 0.00 ( 0%) sys 0.05 ( 0%) wall 14796 kB ( 1%) ggc
varconst : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 532 kB ( 0%) ggc
jump : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
integrated RA : 4.92 (22%) usr 0.12 ( 4%) sys 4.54 (17%) wall 167029 kB ( 8%) ggc
LRA non-specific : 0.38 ( 2%) usr 0.01 ( 0%) sys 0.81 ( 3%) wall 776 kB ( 0%) ggc
LRA virtuals elimination: 0.07 ( 0%) usr 0.00 ( 0%) sys 0.07 ( 0%) wall 6530 kB ( 0%) ggc
LRA reload inheritance : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 4 kB ( 0%) ggc
LRA create live ranges : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall 40 kB ( 0%) ggc
LRA hard reg assignment : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
reload : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall 0 kB ( 0%) ggc
thread pro- & epilogue : 0.16 ( 1%) usr 0.01 ( 0%) sys 0.26 ( 1%) wall 19997 kB ( 1%) ggc
shorten branches : 0.17 ( 1%) usr 0.01 ( 0%) sys 0.16 ( 1%) wall 0 kB ( 0%) ggc
reg stack : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall 0 kB ( 0%) ggc
final : 0.63 ( 3%) usr 0.04 ( 1%) sys 0.69 ( 3%) wall 29353 kB ( 1%) ggc
symout : 1.28 ( 6%) usr 0.06 ( 2%) sys 1.23 ( 5%) wall 173563 kB ( 8%) ggc
uninit var analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc
rest of compilation : 0.81 ( 4%) usr 0.18 ( 5%) sys 0.93 ( 4%) wall 34415 kB ( 2%) ggc
unaccounted todo : 0.25 ( 1%) usr 0.16 ( 5%) sys 0.39 ( 1%) wall 0 kB ( 0%) ggc
TOTAL : 22.71 3.29 26.03 2104543 kB
thanks veio
I haven't seen -ftime-report
before. That actually gives some interesting info on the bottleneck.
phase opt and generate : 12.03 (53%) usr 1.17 (36%) sys 13.22 (51%)
Half the time is spent optimizing, which PCH won't solve. PCH is meant to prevent include files being compiled per translation unit. A unity build is essentially on large translation unit, so re-compiling headers should not be a bottleneck. Unity builds generally imply it will take longer to optimize though, since compiler optimization normally isn't linear with respect to translation unit size.
However, since optimizing is generally designed for non-unity builds, one possible optimization might be using -flto
instead. GCC LTO can be parallelized by passing a thread argument, -flto=8
. The speedup will most likely be less than threads though, for obvious reasons. FYI, you might also need to switch your linker to ld.gold
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With