While developing a header-only library, I'd like to make sure that a given string is embedded in all binaries that use my header, even if the compiler is configured to optimize away unused constants, and the binary gets stripped.
The embedding shouldn't have any side-effects (apart from making the resulting binary a little bit bigger).
I don't know how people are going to use the headers, but
My trivial attempt amounts to:
static char frobnozzel_version_string[] = "Frobnozzel v0.1; © 2019 ACME; GPLv3";
..., but that get's easily removed during the build (since the string is nowhere actually used, it's easy prey for an optimizing compiler).
So the question is: is it possible to embed a string in any binary that includes a given header, that won't get optimized/stripped away by usual strategies to build "Release" binaries?
I'm aware, that anybody who is using the library can just (manually) remove whatever I put in, but let's assume, people just use the header "as is".
Context: the headers in question are released under the GPL, and I'd like to be able to check, if the users actually comply with the license.
You can embed assembly pseudo-ops in your header, and it should stay (although it's never used):
asm(".ascii \"Frobnozzel v0.1; © 2019 ACME; GPLv3\"\n\t");
Note that this is GCC/Clang-specific.
An alternative for MSVC would be using #pragma comment
or __asm db
:
__asm db "Frobnozzel v0.1; © 2019 ACME; GPLv3"
#pragma comment(user, "Frobnozzel v0.1; © 2019 ACME; GPLv3")
Here's an example:
chronos@localhost ~/Downloads $ cat file.c
#include <stdio.h>
#include "file.h"
int main(void)
{
puts("The string is never used.");
}
chronos@localhost ~/Downloads $ cat file.h
#ifndef FILE_H
#define FILE_H 1
#if defined(__GNUC__)
asm(".ascii \"Frobnozzel v0.1; © 2019 ACME; GPLv3\"\n\t");
#elif defined(_MSC_VER)
# if defined(_WIN32)
__asm db "Frobnozzel v0.1; © 2019 ACME; GPLv3"
# elif defined(_WIN64)
# pragma comment(user, "Frobnozzel v0.1; © 2019 ACME; GPLv3")
# endif
#endif
chronos@localhost ~/Downloads $ gcc file.c
chronos@localhost ~/Downloads $ grep "Frobnozzel v0.1; © 2019 ACME; GPLv3" a.out
Binary file a.out matches
chronos@localhost ~/Downloads $
Replace the gcc
command with clang
and the result is the same.
For 64-bit Windows, this requires either replacing user
with the deprecated exestr
or creating a resource file that embeds the string in the executable file. As this is, the string will be removed when linking.
You might not be able to force a value into the compilation unit, but you can force a symbol by defining a global variable in the header. i.e.: long using_my_library_version_1_2_3;
The symbol will be accessible externally in the final binary file and could be tested against (though, like any solution, it could be circumvented, not to mention that the header itself could be altered).
EDIT: To clarify (due to comment), don't use a static
variable.
By using a global variable it will default to extern
and will not be optimized away (in case other objects loading the binary use the identifier).
As mentioned in the comments, the global variable's identifier (name) is the string in this approach.
However, when compiling executables (and kernels), identifiers could be stripped from the final binary when compiling with (-s
). This is often performed by embedded system developers and by people that enjoy making debugging a living hell (even more than it is anyway).
A quick example:
// main.c
int this_is_example_version_0_0_1; /* variable name will show in the file */
int main(void) {
/* placed anywhere to avoid the "not used" warning: */
(void)this_is_example_version_0_0_1;
return 0;
}
// extra.c
int this_is_example_version_0_0_1; /* repeat line to your heart's content */
int this_is_example_version_0_0_1; /* (i.e., if header has no include guard) */
Compile:
$ cc -xc -o a -Wall -O2 main.c extra.c
List all identifiers/names (will show global):
nm ./a | grep "this_is_example_version"
Test for string in binary file using:
$ grep -F "this_is_example_version" ./a
Funny facts about C that make this solution possible...:
C defines extern
as the default for both function and variable declarations in the global scope (6.2.2, subsection 5).
According to section 6.2.2 ("Linkages of identifiers"), "each declaration of a particular identifier with external linkage denotes the same object or function."
This means that duplicate declarations in the global scope will be collated to a single declaration.
Variable declarations and variable definitions look the same when the variable is placed in the global scope and all of it's bits are set to zero.
This is because global variables are initialized to zero by default. Hence, compilers can't tell if int foo;
is a definition (int foo = 0;
) or a declaration (extern int foo;
).
Because of this "identity" and these rules, compilers convert ambiguous global variable declarations/definitions into "weak" declarations, to be resolved by the linker.
This means that if you define a global variable without the extern
keyword and without a value, the ambiguous declaration/definition will force the compiler to emit a weak symbol that will be exposed in the final binary.
This symbol could be used to identify the fact that the header was used somewhere in the program.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With