I noticed that sometimes even if I don't use iostream
and related I/O libraries, my binaries produced by Mingw were still unreasonably large.
For example, I wrote a code to use vector
and cstdio
only and compiled it with -O2 -flto
, my program can go as large as 2MB! I run nm main.exe > e.txt
and was shocked to see all the iostream
related functions in it.
After some googling, I learnt to use -ffunction-sections -Wl,-gc-sections
, that reduces the program size from 2MB to ~300KB (if with -s
, 100+KB). Excellent!
To further test the effect of -ffunction-sections -Wl,-gc-sections
, here is another code:
#include <cstdio>
#include <vector>
#include <tuple>
#include <algorithm>
#include <chrono>
#include <windows.h>
#undef min
struct Point {
int x, y;
};
constexpr int length = 5;
constexpr int half_length() {
return length & 1 ? length : length - 1;
}
template<class F>
int func_template(F&& f) {
#ifdef _MSC_VER
puts(__FUNCSIG__);
#else
puts(__PRETTY_FUNCTION__);
#endif
printf("\n");
return f();
}
struct fake_func {
int operator()() const { return 59; };
};
template<class F, class... Args>
int pass_args(F&& f, Args&&... args) {
#ifdef _MSC_VER
puts(__FUNCSIG__);
#else
puts(__PRETTY_FUNCTION__);
#endif
printf("\n");
return f(std::forward<Args>(args)...);
}
template<class T>
T min(T x) {
return x;
}
template<class T, class... Args>
T min(T x, Args... args) {
T y = min(args...);
return x < y ? x : y;
}
void type_verifier(int x) {
printf("%dd ", x);
}
void type_verifier(char x) {
printf("'%c' ", x);
}
void type_verifier(double x) {
printf("%lff ", x);
}
template<class T>
void type_verifier(T x) {
printf("unknown ");
}
template<class T, class... Args>
void type_verifier(T x, Args... args) {
type_verifier(x);
type_verifier(args...);
}
int bufLen;
char buf[100];
template<class... Args>
inline int send(Args... args) {
bufLen = sprintf(buf, std::forward<Args>(args)...);
return bufLen;
}
namespace std {
inline namespace v1 {
void func() {
printf("I am v1\n");
}
}
namespace v2 {
void func() {
printf("I am v2\n");
}
}
}
int main() {
std::vector<int> v {1, 2, 3, 4, 5};
for (auto &i : v) printf("%d ", i);
printf("\n");
Point p {1, 2};
printf("%d %d\n", p.x, p.y);
auto t = std::make_tuple("Hello World", 12);
printf("%s %d\n", std::get<0>(t), std::get<1>(t));
int a, b;
auto f = []() { return std::make_tuple(1, 2); };
std::tie(a, b) = f();
printf("%d %d\n", a, b);
//int test_constexpr[half_length() + 4];
int ft = func_template([]{ return 42; });
printf("func_template: %d\n", ft);
ft = func_template(fake_func {});
printf("func_template: %d\n", ft);
ft = pass_args([](int x, int y) { return x + y; }, 152, 58);
printf("pass_args: %d\n", ft);
ft = pass_args([](int n, const char *m) {
for (int i = 0; i < n; i++) printf("%c ", m[i]);
printf("\n");
return 0;
}, 5, "Hello");
printf("min: %d\n", min(3, 4, 2, 1, 5));
type_verifier(12, 'A', 0.5, "Hello");
printf("\n");
/* send("Hello World");
send("%d", 1);
send("%d", "1234");
sprintf(buf, "%d", "123");*/
std::func();
std::v1::func();
std::v2::func();
std::rotate(v.begin(), v.begin() + 2, v.end());
for (auto &i : v) printf("%d ", i);
printf("\n");
auto start = std::chrono::steady_clock::now();
std::vector<int> x {2, 4, 2, 0, 5, 10, 7, 3, 7, 1};
printf("insertion sort: ");
for (auto &i: x) printf("%d ", i);
printf("\n");
// insertion sort
for (auto i = x.begin(); i != x.end(); ++i) {
std::rotate(std::upper_bound(x.begin(), i, *i), i, i+1);
for (auto &j: x) printf("%d ", j);
printf("\n");
}
std::vector<int> heap {7, 5, 3, 4, 2};
std::make_heap(heap.begin(), heap.end());
std::pop_heap(heap.begin(), heap.end());
printf("Pop heap (%d)\n", heap.back());
heap.pop_back();
heap.push_back(1);
std::push_heap(heap.begin(), heap.end());
std::sort_heap(heap.begin(), heap.end());
for (auto &i: heap) printf("%d ", i);
printf("\n");
auto end = std::chrono::steady_clock::now();
auto diff = end - start;
printf("time: %I64d ms\n",
std::chrono::duration_cast<std::chrono::milliseconds>(diff).count());
{
auto u = v;
std::move_backward(u.begin(), u.begin() + u.size() - 1, u.begin() + u.size());
for (auto &i : u) printf("%d ", i);
printf("\n");
}
{
auto u = v;
std::move(u.begin() + 1, u.begin() + u.size(), u.begin());
for (auto &i : u) printf("%d ", i);
printf("\n");
}
start = std::chrono::steady_clock::now();
Sleep(2000);
end = std::chrono::steady_clock::now();
diff = end - start;
printf("time: %I64d ms\n",
std::chrono::duration_cast<std::chrono::milliseconds>(diff).count());
std::chrono::steady_clock::time_point before;
before = std::chrono::steady_clock::now();
Sleep(2000);
auto after = std::chrono::steady_clock::now();
printf("%f seconds\n", std::chrono::duration<double>(after - before).count());
return 0;
}
To my disappointment, the final program is once again > 2MB.
Interestingly, cl.exe
thoughtfully remove all iostream
related functions consistently even if I didn't use /O2
or any other flags, just cl.exe main.cpp
. (For the code above, cl.exe
produces 100+KB binary).
Did I miss any other useful gcc flags for this?
Specification:
Compare with Linux
I compared the binaries produced by gcc 4.9.2 (Linux) and gcc 4.9.3 (mingw-w64) for the above code (except windows.h
and Sleep
were removed).
Compile flag
g++ -o c++11 c++11.cpp -std=c++11 -static-libgcc -static-libstdc++ -ffunction-sections -Wl,-gc-sections -O2
Linux gcc did successfully strip away iostream
and functions without the need for -flto
while Mingw-w64 gcc just can't do it properly.
Windows only support PE format while Linux supports ELF format, allowing Linux to use Gold linker. Maybe this is the explanation?
I eventually filed a bug at https://sourceforge.net/p/mingw-w64/bugs/578/ . Let's hope it gets some attentions!
The Compiler takes this feedback file from the prior build and uses it to place unused functions into their own ELF section in the object file. Then linker can place them in the unused sections and removes them from build.
--gc-sections decides which input sections are used by examining symbols and relocations. The section containing the entry symbol and all sections containing symbols undefined on the command-line will be kept, as will sections containing symbols referenced by dynamic objects.
Try stripping debug and symbol info from static libstdc++ via -Wl,--strip-all
. This reduced my executable from 9M to 670K on Cygwin (13x) and from 6M to 80K on Ubuntu (80x).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With