Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reducing the footprint of debug symbols (executable is bloated to 4 GB)

So the basic problem is that my built executable is 4GB in size with debug symbols turned on (between 75 MB and 300 MB with no debug symbols and varying optimisation levels). How can I diagnose/analyse where all these symbols are coming from, and which are the biggest offenders in terms of taking up space? I have found some questions on reducing the non-debug executable size (though they have not been terribly illuminating), but here I am mainly concerned with reducing the debug symbol clutter. The executable is so large that it takes gdb a significant amount of time to load up all the symbols, which is hindering debugging. Perhaps reducing the code bloat is the fundamental task, but I would first like to know where my 4GB is being spent.

Running the executable through 'size --format=SysV' I get the following output:

section                    size       addr
.interp                      28    4194872
.note.ABI-tag                32    4194900
.note.gnu.build-id           36    4194932
.gnu.hash                714296    4194968
.dynsym                 2728248    4909264
.dynstr                13214041    7637512
.gnu.version             227354   20851554
.gnu.version_r              528   21078912
.rela.dyn                 37680   21079440
.rela.plt                 15264   21117120
.init                        26   21132384
.plt                      10192   21132416
.text                  25749232   21142608
.fini                         9   46891840
.rodata                 3089441   46891872
.eh_frame_hdr            584228   49981316
.eh_frame               2574372   50565544
.gcc_except_table       1514577   53139916
.init_array                2152   56753888
.fini_array                   8   56756040
.jcr                          8   56756048
.data.rel.ro             332264   56756064
.dynamic                    992   57088328
.got                        704   57089320
.got.plt                   5112   57090048
.data                     22720   57095168
.bss                    1317872   57117888
.comment                     44          0
.debug_aranges          2978704          0
.debug_info           278337429          0
.debug_abbrev           1557345          0
.debug_line            13416850          0
.debug_str           3620467085          0
.debug_loc            236168202          0
.debug_ranges          37473728          0
Total                4242540803

from which I guess we can see that 'debug_str' take up ~3.6 GB. I don't 100% know what "debug_str" are but I guess they might literally be the string names of the debug symbols? So is this telling me that the de-mangled names of my symbols are just insanely big? How can I figure out which ones and fix them?

I guess I can somehow do something with 'nm', directly inspecting the symbol names, but the output is enormous and I'm not sure how best to search it. Are there any tools to do this kind of analysis?

The compiler used was 'c++ (GCC) 4.9.2'. And I guess I should mention that I am working in a linux environment.

like image 708
Ben Farmer Avatar asked Oct 25 '16 14:10

Ben Farmer


People also ask

How do I get rid of debug symbols?

To remove debugging symbols from a binary (which must be an a. out or ELF binary), run strip --strip-debug filename. Wildcards can be used to treat multiple files (use something like strip --strip-debug $LFS/tools/bin/*).

Do debug symbols affect performance?

Load time will be increased when the debug symbols are present over when not present. The on-disk footprint will be larger. If you compiled with zero optimization then you really lose nothing. If you set optimization, then the optimized code will be less optimized because of the debug symbols.

How can I tell if an executable has debug symbols?

To check if there's debug info inside the kernel object, you can add the following at the end of the objdump command: | grep debug . If this string is found, you know the kernel object contains debug information. If not, then it's a "clean" kernel object.

What is the meaning of debug symbols?

A debug symbol is a special kind of symbol that attaches additional information to the symbol table of an object file, such as a shared library or an executable.


2 Answers

So I have tracked down the main culprit by doing the following, based mostly on John Zwinck's answer. Essentially I just followed his suggestion to just run "string" on the executable and analyzed the output.

strings my_executable > exec_strings.txt

I then sorted the output mostly following mindriot's method:

cat exec_strings.txt | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2- > exec_strings_sorted.txt

and had a look at the longest strings. Indeed it all seemed to be some insane template bloat, from a particular library. I then did a little more counting like:

cat exec_strings.txt | wc -l
2928189
cat exec_strings.txt | grep <culprit_libname> | wc -l
1108426

to see that of the approximately 3 million strings that are extracted, it seems like ~1 million of them were coming from this library. Finally, doing

cat exec_strings.txt | wc -c
3659369876
cat exec_strings.txt | grep <culprit_libname> | wc -c
3601918899

it became apparent that these million strings are all super long and constitute the great bulk of the debug symbol garbage. So at least now I can focus on this one library while trying to remove the root of the problem.

like image 96
Ben Farmer Avatar answered Oct 21 '22 00:10

Ben Farmer


One trick I use is to run strings on the executable, which will print all those long (probably due to templates) and numerous (ditto) debug symbol names. You can pipe it to sort | uniq -c | sort -n and look at the results. In many large C++ executables you'll see patterns like this:

my_template<std::basic_string<char, traits, allocator>, std::unordered_map<std::basic_string<char, traits, allocator>, 1L>
my_template<std::basic_string<char, traits, allocator>, std::unordered_map<std::basic_string<char, traits, allocator>, 2L>
my_template<std::basic_string<char, traits, allocator>, std::unordered_map<std::basic_string<char, traits, allocator>, 3L>

You get the idea.

In some cases I've decided to simply reduce the amount of templating. Sometimes it gets out of hand. Other times you may win something by using explicit template instantiation, or compiling specific parts of your project without debugging symbols, or even disabling RTTI if you don't rely on dynamic_cast or typeid.

like image 43
John Zwinck Avatar answered Oct 21 '22 00:10

John Zwinck