So the basic problem is that my built executable is 4GB in size with debug symbols turned on (between 75 MB and 300 MB with no debug symbols and varying optimisation levels). How can I diagnose/analyse where all these symbols are coming from, and which are the biggest offenders in terms of taking up space? I have found some questions on reducing the non-debug executable size (though they have not been terribly illuminating), but here I am mainly concerned with reducing the debug symbol clutter. The executable is so large that it takes gdb a significant amount of time to load up all the symbols, which is hindering debugging. Perhaps reducing the code bloat is the fundamental task, but I would first like to know where my 4GB is being spent.
Running the executable through 'size --format=SysV' I get the following output:
section size addr
.interp 28 4194872
.note.ABI-tag 32 4194900
.note.gnu.build-id 36 4194932
.gnu.hash 714296 4194968
.dynsym 2728248 4909264
.dynstr 13214041 7637512
.gnu.version 227354 20851554
.gnu.version_r 528 21078912
.rela.dyn 37680 21079440
.rela.plt 15264 21117120
.init 26 21132384
.plt 10192 21132416
.text 25749232 21142608
.fini 9 46891840
.rodata 3089441 46891872
.eh_frame_hdr 584228 49981316
.eh_frame 2574372 50565544
.gcc_except_table 1514577 53139916
.init_array 2152 56753888
.fini_array 8 56756040
.jcr 8 56756048
.data.rel.ro 332264 56756064
.dynamic 992 57088328
.got 704 57089320
.got.plt 5112 57090048
.data 22720 57095168
.bss 1317872 57117888
.comment 44 0
.debug_aranges 2978704 0
.debug_info 278337429 0
.debug_abbrev 1557345 0
.debug_line 13416850 0
.debug_str 3620467085 0
.debug_loc 236168202 0
.debug_ranges 37473728 0
Total 4242540803
from which I guess we can see that 'debug_str' take up ~3.6 GB. I don't 100% know what "debug_str" are but I guess they might literally be the string names of the debug symbols? So is this telling me that the de-mangled names of my symbols are just insanely big? How can I figure out which ones and fix them?
I guess I can somehow do something with 'nm', directly inspecting the symbol names, but the output is enormous and I'm not sure how best to search it. Are there any tools to do this kind of analysis?
The compiler used was 'c++ (GCC) 4.9.2'. And I guess I should mention that I am working in a linux environment.
To remove debugging symbols from a binary (which must be an a. out or ELF binary), run strip --strip-debug filename. Wildcards can be used to treat multiple files (use something like strip --strip-debug $LFS/tools/bin/*).
Load time will be increased when the debug symbols are present over when not present. The on-disk footprint will be larger. If you compiled with zero optimization then you really lose nothing. If you set optimization, then the optimized code will be less optimized because of the debug symbols.
To check if there's debug info inside the kernel object, you can add the following at the end of the objdump command: | grep debug . If this string is found, you know the kernel object contains debug information. If not, then it's a "clean" kernel object.
A debug symbol is a special kind of symbol that attaches additional information to the symbol table of an object file, such as a shared library or an executable.
So I have tracked down the main culprit by doing the following, based mostly on John Zwinck's answer. Essentially I just followed his suggestion to just run "string" on the executable and analyzed the output.
strings my_executable > exec_strings.txt
I then sorted the output mostly following mindriot's method:
cat exec_strings.txt | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2- > exec_strings_sorted.txt
and had a look at the longest strings. Indeed it all seemed to be some insane template bloat, from a particular library. I then did a little more counting like:
cat exec_strings.txt | wc -l
2928189
cat exec_strings.txt | grep <culprit_libname> | wc -l
1108426
to see that of the approximately 3 million strings that are extracted, it seems like ~1 million of them were coming from this library. Finally, doing
cat exec_strings.txt | wc -c
3659369876
cat exec_strings.txt | grep <culprit_libname> | wc -c
3601918899
it became apparent that these million strings are all super long and constitute the great bulk of the debug symbol garbage. So at least now I can focus on this one library while trying to remove the root of the problem.
One trick I use is to run strings
on the executable, which will print all those long (probably due to templates) and numerous (ditto) debug symbol names. You can pipe it to sort | uniq -c | sort -n
and look at the results. In many large C++ executables you'll see patterns like this:
my_template<std::basic_string<char, traits, allocator>, std::unordered_map<std::basic_string<char, traits, allocator>, 1L>
my_template<std::basic_string<char, traits, allocator>, std::unordered_map<std::basic_string<char, traits, allocator>, 2L>
my_template<std::basic_string<char, traits, allocator>, std::unordered_map<std::basic_string<char, traits, allocator>, 3L>
You get the idea.
In some cases I've decided to simply reduce the amount of templating. Sometimes it gets out of hand. Other times you may win something by using explicit template instantiation, or compiling specific parts of your project without debugging symbols, or even disabling RTTI if you don't rely on dynamic_cast
or typeid
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With